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(54) Singing voice synthesis 

(57) There are provided a singing voice-synthesiz- 
ing method and apparatus which is capable of perform- 
ing synthesis of natural singing voices close to human 
singing voices based on performance data being input 
in real time. Performance data is inputted for each pho- 
netic unit constituting a lyric, to supply phonetic unit in- 
fonnation, singing-starting time point information, sing- 
ing length information, etc. thereof. The singing-starting 
time point information represents the actual singing- 
starting time point. Each performance data is inputted 
in timing earlier than the actual singing-starting time 
point, and has its phonetic unit Information converted to 
a phonetic unit transition time length. The phonetic unit 
transition time length is formed by a first phoneme gen- 
eration time length and a second phoneme generation 
time length, for a phonetic unil formed by a first pho- 



neme and a second phoneme. By using the phonetic 
unit transltibn time, the singing-starting time point infor- 
mation, and the singing length information, the singing- 
starting time points and singing duration times of the first 
and second phonemes are determined. The singing- 
starting time point of a consonant (first phoneme) is set 
to be earlier than the actual singing-starting time point. 
The singing-starting time point of a vowel (second pho- 
neme) is made coincident with or earlier or later than the 
actual singing-starting time point. In the singing voice 
synthesis, for each phoneme, a singing voice is gener- 
ated at the determined singing-starting time point and 
continues to be generated for the determined singing 
duration time. State transition characteristics and ef- 
fects characteristics may be controlled according to in- 
put control information. 
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Description 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] This invention relates to a singing voice-syn- 
thesizing method and apparatus for synthesizing sing- 
ing voices based on perfonnance data being input in real 
time, and a storage medium storing a program for exe- 
cuting the method. 

Prior Art 

[0002] Conventionally, a singing voice-synthesizing 
method of the above-mentioned kind has been pro- 
posed which makes the rise time of a phoneme to be 
sounded first (first phoneme) in accordance with a note- 
on signal based on performance data shorter than the 
rise time of the same phoneme when it is sounded in 
succession to another phoneme during the note-on pe- 
riod (see e.g. Japanese Laid-Open Patent Publication 
(Kokai) No. 10-49169). 

[0003] FIG. 40A shows consonant singing-starting 
timing and vowel singing-starting timing of human sing- 
ing, and this example shows a case in which words of 
a song, "sa" - *'i" - "ta", are sung at the respective pitches 
of "C3(do)", "D3(re)", and"E3(mi)". In FIG. 40A, phonetic 
units each formed by a combination of a consonant and 
a vowel, such as "sa" and "ta", are produced such that 
the consonant starts to be sounded eartier than the vow- 
el. 

[0004] On the other hand, FIG. 40B shows singing- 
starting timing of singing voices synthesized by the 
above-described conventional singing voice-synthesiz- 
ing method. In this example, the same words of the lyric 
as in FIG. 40A are sung. Actual singing-starting time 
points T1 to T3 indicate respective starting time points 
at which singing voices start to be generated in re- 
sponse to respective note-on signals. According to the 
conventional method, when the singing voice of "sa" Is 
generated, the singing-starting time point of the conso- 
nant "s" is set equal to or coincident with the actual sing- 
ing-starting time point T1 , and the amplitude level of the 
consonant "s" is rapidly increased from the time point 
T1 so as to avoid giving an impression of the singing 
voice being delayed compared with instrument sound 
(accompaniment sound). 

[0005] The conventional singing voice-synthesizing 
method suffers from the following problems: 

(1 ) The vowel singing-starting time points of the hu- 
man singing shown in FIG, 40A approximately cor- 
responds to the actual singing-starting time points 
(note-on time points) in the singing voice synthesis 
shown in FIG. 40B. However, in the case of FIG. 
40B, the consonant singing-starting time points are 
set equal to the respective note-on time points, and 



10 



15 



at the same time the rise time of each consonant 
(first phoneme) is shortened, so that compared with 
the FIG. 40A case, the singing-starting timing and 
singing duration time become unnatural. 

(2) Information of a phonetic unit is transmitted im- 
mediately before a note-on time point of the phonet- 
ic unit, and the singing voice corresponding to the 
information of the phonetic unit starts to be gener- 
ated at the note-on time point. Therefore, it is im- 
possible to start generation of the singing voice ear- 
lier than the note-on time point. 

(3) The singing voice is not controlled in respect of 
state transitions, such as an attack (rise) portion, 
and a release (fall) portion. This makes it impossible 
to synthesize more natural singing voices. 

(4) The singing voice is not controlled in respect ef- 
fects, such as vibrato. This makes it impossible to 
synthesize more natural singing voices. 



20 SUMMARY OF THE INVENTION 

[0006] It is an object of the present invention to pro- 
vide a singing voice-synthesizing method and appara- 
tus which is capable of synthesizing natural singing voic- 

^5 es close to human singing voices based on performance 
data being input in real time, and a storage medium stor- 
ing a program for executing the method. 
[0007] To attain the above object, according to a first 
aspect of the invention, there is provided a singing 

30 voice-synthesizing method comprising the steps of In- 
putting phonetic unit information representative of a 
phonetic unit, time infomnation representative of a sing- 
ing-starting time point, and singing length information 
representative of a singing length, in timing earlier than 

35 the singing-starting time point, for a singing phonetic 
unit including a sequence of a first phoneme and a sec- 
ond phoneme, generating a phonetic unit transition time 
length fomned by a generation time length of the first 
phoneme and a generation time length of the second 

40 phoneme, based on the inputted phonetic unit informa- 
tion, detennining a singing-starting time point and a 
singing duration time of the first phoneme and a singing- 
starting time point and a singing duration time of the sec- 
ond phoneme, based on the generated phonetic unit 

45 transition time length, the inputted time Information and 
singing length infomriation, and starting generation of a 
first singing voice and a second singing voice formed by 
the first phoneme and the second phoneme at the sing- 
ing-starting time point of the first phoneme and the sing- 
le ing-starting time point of the second phoneme, respec- 
tively, and continuing generation ofthefirst singing voice 
and the second singing voice for the singing duration 
time of the first phoneme and the singing duration time 
of the second phoneme, respectively. 

55 [0008] Preferably, the determining step includes set- 
ting the singing-starting time point of the first phoneme 
to a time point earlierthan the signing-starting time point 
represented by the time Information. 
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[0009] According to this singing voice-synthesizing 
method, the phonetic unit information, the time informa- 
tion, and the singing length information are inputted In 
timing earlier than the singing-starting time point repre- 
sented by the time infomnation, and a phonetic unit tran- 
sition time length is formed based on the phonetic unit 
infonnation. Further, a singing-starting time point and a 
singing duration time of the first phoneme and a singing- 
starting time point and a singing duration time of the sec- 
ond phoneme are determined based on the generated 
phonetic unit transition time length. As a result, as to the 
first and second phonemes, It Is possible to determine 
desired signlng-starting time points before or after the 
singing-starting time point represented by the time in- 
formation, or determine singing duration times different 
from the singing length represented by the singing 
length Information, whereby natural signing sounds can 
be produced as the first and second singing phonetic 
units. For example, if the singing-starting lime point of 
the first phoneme can be set to a time point earlier than 
the singing-starting time point represented by the time 
Information, it Is possible to make the rise of a consonant 
sufficiently earlier than the rise of a vowel to thereby syn- 
thesize singing voices close to human singing voices. 
[0010] To attain the above object, according to a sec- 
ond aspect of the invention, there Is provided a singing 
volce-syntheslzing method comprising the steps of In- 
putting phonetic unit Information representative of a 
phonetic unit, time Information representative of a sing- 
ing-starting time point, and singing length information 
representative of a singing length, for a singing phonetic 
unit, generating a state tl-ansltlon time length corre- 
sponding to a rise portion, a note transition portion, or a 
fall portion of the singing phonetic unit, based on the 
Inputted phonetic unit information, and generating a 
singing voice formed by the phonetic unit, based on the 
phonetic unit Information, the time Infomnation, and the 
singing length information which have been inputted, 
the generating step including adding a change in at least 
one of pitch and amplitude to the singing voice during a 
time period corresponding to the generated state tran- 
sition time length. 

[0011] According to this singing voice-synthesizing 
method, the state transition time length is generated 
based on the inputted phonetic unit, and a change in at 
least one of pitch and amplitude Is added to the singing 
voice during a time period corresponding to the gener- 
ated state transition time length. This makes it possible 
to synthesize natural singing voices with feelings of at- 
tack, note transition, or release. 

[001 2] To attain the above object, according to a third 
aspect of the Invention, there is provided a singing 
voice-synthesizing apparatus comprising an input sec- 
tion that inputs phonetic unit information representative 
of a phonetic unit, time infomnation representative of a 
singing-starting time point, and singing length informa- 
tion representative of a singing length, In timing earlier 
than the singing-starting time point, for a phonetic unit 



including a sequence of a first phoneme and a second 
phoneme, a storage section that stores a phonetic unit 
transition time length formed by a generation time length 
of the first phoneme and a generation time length of the 
5 second phoneme, a readout section that reads out the 
phonetic unit transition time length from the storage sec- 
tion based on the phonetic unit Information Inputted by 
the input section, a calculating section that calculates a 
singing-starting time point and a singing duration time 
10 of the first phoneme, and a singing-starting time point 
and a singing duration time of the second phoneme, 
based on the phonetic unit transition time length read 
by the readout section and the time information and the 
singing length information which have been inputted by 
'5 the input section, and a singing voice-synthesizing sec- 
tion that starts generation of a first singing voice and a 
second singing voice fomied by the first phoneme and 
the second phoneme at the singing-starting time point 
of the first phoneme and the singing-starting lime point 
of the second phoneme calculated by the calculating 
section, respectively, and continuing generation of the 
first singing voice and the second singing voice for the 
singing duration time of the first phoneme and the sing- 
ing duration time of the second phoneme calculated by 
the calculating section, respectively. 
[0013] This singing volce-syntheslzing apparatus Im- 
plements the signing sound-synthesizing method ac- 
cording to the first aspect of the invention, and hence 
the same advantageous effects described as to this 
method can be obtained. Further, since the apparatus 
is configured such that the phonetic unit transition time 
length is read from the storage section, the construction 
of the apparatus or the processing executed thereby can 
be simple even if the number of signing phonetic units 
is increased. 

[0014] Preferably, the Input section inputs modifying 
Infonnatlon for modifying the generation time length of 
the first phoneme, and the calculating section modifies 
the generation time length of the first phoneme in the 
phonetic unit transition time length read by the readout 
section according to the modifying information Inputted 
by the Input section, and then calculates the singing- 
starting time point and the singing duration time of the 
first phoneme and the singing-starting time point and the 
singing duration time of the second phoneme, based on 
the phonetic unit transition time length Including the 
modified generation time length of the first phoneme. 
[0015] According to this preferred embodiment, it is 
possible to reflect the operator's intention on the sing- 
ing-starting time points and singing duration times of the 
first and second phonemes, and hence synthesize more 
natural singing voices. 

[0016] To attain the above object, according to a 
fourth aspect of the invention, there is provided a singing 
voice-synthesizing apparatus comprising an Input sec- 
tion that Inputs phonetic unit information representative 
of a phonetic unit, time Information representative of a 
singing-starting time point, and singing length Informa- 
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tion representative of a singing length, for a singing pho- 
netic unit, a storage section that stores state transition 
time lengths corresponding to a rise portion, a note tran- 
sition portion, or a fall portion of the singing phonetic 
unit, a readout section that reads out the state transition 5 
time length from the storage section based on the pho- 
netic unit Infonnation inputted by the Input section, and 
a singing voice-synthesizing section that generates a 
singing voice formed by the phonetic unit, based on the 
phonetic unit information, the time information, and the io 
singing length Information which have been inputted by 
the Input section, the singing voice-synthesizing section 
adding a change in at least one of pitch and amplitude 
to the singing voice during a time period corresponding 
to the state transition time length read out by the readout '5 
section . 

[001 7] This singing voice-synthesizing apparatus im- 
plements the signing sound-synthesizing method ac- 
cording to the second aspect of the invention, and hence 
the same advantageous effects described as to this 
method can be obtained. Further, since the apparatus 
Is configured such that the state transition time length 
Is read from the storage section, the construction of the 
apparatus or the processing executed thereby can be 
simple even If the number of signing phonetic units Is 25 
Increased. 

[0018] Preferably, the input section inputs modifying 
infonnation for modifying the state transition time 
lengths, and the singing voice-synthesizing apparatus 
includes a modifying section that modifies the corre- so 
spending state transition time length read out by the re- 
adout section based on the modifying information input- 
ted by the input section, the singing voice-synthesizing 
section adding a change In at least one of pitch and am- 
plitude to the singing voice during a time period corre- 35 
sponding to the state transition time length modified by 
the modifying section. 

[0019] According to this preferred embodiment, it is 
possible to reflect the operator's intention on the state 
transition time length, and hence synthesize more nat- 40 
ural singing voices. 

[0020] To attain the above object, according to a fifth 
aspect of the invention, there Is provided a signing 
sound-synthesizing apparatus comprising an input sec- 
tion that inputs phonetic unit infonnation representative 
of a phonetic unit, time information representative of a 
slnglng-slarllng time point, singing length Information 
representative of a singing length, and effects-imparting 
Information, for a singing phonetic unit, and a singing 
voice-synthesizing section that generates a singing 50 
voice formed by the phonetic unit, based on the phonetic 
unit Infonnation, the time information, and the singing 
length Information which have been inputted by the input 
section , the singing voice synthesizing section imparting 
effects to the singing voice based on the effects-impart- 55 
ing information inputted by the input section. 
[0021] According to this singing voice-synthesizing 
apparatus, it is possible to add minute changes in pitch 



and amplitude, e.g. those In vibrato effect, to singing 
voices, whereby more natural singing voices can be 

synthesized. 

[0022] Preferably the effects- imparting Information 
inputted by the input section represents an effects-im- 
parting time period, and the singing voice-synthesizing 
apparatus further comprises a setting section that sets 
a new effects- Imparting time period corresponding to 
both the effects-Imparting time period represented by 
the effects-imparting information and a second effects- 
imparting time period of a singing phonetic unit preced- 
ing the singing phonetic unit If the effects-Imparting time 
period is continuous from the second effects-imparting 
time period, the singing voice-synthesizing section Im- 
parting effects to the singing voice during the new ef- 
fects-imparting time period set by the setting section. 
[0023] According to this preferred embodiment, since 
effects are imparted by setting a new effects-imparting 
time period corresponding to effects Imparting-time pe- 
riods continuous to each other, effects are not interrupt- 
ed to improve the continuity thereof. 
[0024] To attain the above object, according to a sixth 
aspect of the invention, there Is provided a singing 
voice-synthesizing apparatus comprising an input sec- 
tion that inputs phonetic unit information representative 
of a phonetic unit, time information representative of a 
singing-starting time point, and singing length informa- 
tion representative of a singing length, for a singing pho- 
netic unit, in timing eariierthan the signlng-starting time 
point, a setting section that randomly sets a new singing- 
starting time point, within a predetemilned time range 
extending before and after the singing-starting time 
point, based on the time information Inputted by the in- 
put section, and a singing voice-synthesizing section 
that generates a singing voice formed by the phonetic 
unit, based on the phonetic unit infonnation and the 
singing length information which have been inputted by 
the input section, and the singing-starting time point set 
by the setting section, the singing voice synthesizing 
section starting generation of the signing sound at the 
new singing-starting time point set by the setting sec- 
tion. 

[0025] According to this singing voice-synthesizing 
apparatus, a new singing-starting time point is randomly 
set within a predetermined time range extending before 
and after the singing-starting time point represented by 
the lime Infonnation, and a singing voice is generated 
at the set singing-starting time point. This makes it pos- 
sible to synthesize more natural singing voices with var- 
iations in signing-starting timing. 

[0026] To attain the above object, there Is provided a 
storage medium storing a program for executing the 
singing voice-synthesizing method according to the first 
aspect of the invention. 

[0027] Similarly, there is provided a storage medium 
storing a program for executing the singing voice-syn- 
thesizing method according to the second aspect of the 
Invention. 
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[0028] The above and other objects, features and ad- 
vantages of the present invention will become more ap- 
parent from the following detailed description taken in 
conjunction with the accompanying drawings. 

• 5 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0029] 

FIGS. 1 A and 1 B show singing-starting timing of hu- io 
man singing, and singing-starting timing of a singing 
voice synthesized by a singing voice-synthesizing 
method according to the present invention, for conn- 

parison; 

FIG. 2 is a block diagram showing the circuit con- is 
figuration of a singing voice-synthesizing apparatus 
according to an embodiment of the present inven- 
tion; 

FIG. 3 is a flowchart useful In explaining the outline 

of a singing voice-synthesizing process executed 20 

by the FIG. 2 apparatus; 

FIG. 4 is a diagram showing Information stored in 
performance data; 

FIG. 5 Is a diagram showing information stored In a 
phonetic unit database (DB); 25 
FIGS. 6A and 6B are diagrams showing information 
stored in a phonetic unit transition DB; 
FIG. 7 is a diagram showing Information stored in a 
state transition DB; 

FIG. 8 is a diagram showing stored In a vibrato DB; 30 
FIG. 9 is a diagram useful In explaining a process 
of singing voice synthesis based on performance 
data; 

FIG. 1 0 is a diagram showing a state of a reference 
score and a singing voice synthesis score being 35 
formed; 

FIG. 11 is a diagram showing a manner of forming 
a singing voice synthesis score when performance 
data is added to the reference score; 
FIG. 12 is a diagram showing a manner of forming 40 
the singing voice synthesis score when perform- 
ance data is inserted into the reference score; 
FIG. 13 is a diagram showing a manner of forming 
the singing voice synthesis score and a manner of 
synthesizing singing voices; 45 
FIG. 14 is a diagram useful In explaining various 
items in a phonetic unit track in FIG. 13; 
FIG. 15 is a diagram useful in explaining various 
items in a transition track in FIG. 13; 
FIG. 16 is a diagram useful in expUiinIng various so 
items in a vibrato track in FIG, 13; 
FIGS. 1 7 is a flowchart showing a performance da- 
ta-receiving process/singing voice synthesis score- 
forming process; 

FIG. 18 is a flowchart showing the details of the 55 
singing voice synthesis score-forming process; 
FIG. 1 9 is a flowchart showing a management data- 
forming process: 



FIG. 20 is a diagram useful in explaining a manage- 
ment data-fonning process in the case of Event 
State = Transition; 

FIG. 21 Is a diagram useful in explaining a manage- 
ment data-fomiing process in the case of Event 
State = Attack; 

FIG. 22 is a flowchart showing a phonetic unit track- 
forming process; 

FIG. 23 is a flowchart showing a phonetic unit tran- 
sition length-retrieving process; 
FIG. 24 is a flowchart showing a silence singing 
length -calculating process; 

FIG. 25 is a diagram showing a consonant singing 
length-calculating process in the case of a conso- 
nant expansion/compression ratio being largerthan 
1 , in the FIG. 24 process; 

FIG. 26 Is a diagram showing a consonant singing 
length-calculating process in the case of the conso- 
nant expansion/compression ratio being smaller 
than 1 , in the FIG. 24 process; 
FIGS. 27A to 27C are diagrams showing examples 
of silence singing length calculation; 
FIG. 28 is a flowchart showing a preceding vowel 
singing length-calculating process; 
FIG. 29 Is a diagram showing a consonant singing 
length-calculating process in the case of the conso- 
nant expansion/compression ratio being largerthan 
1 , in the FIG. 28 process; 

FIG. 30 is a diagram showing a consonant singing 
length-calculating process in the case of the conso- 
nant expansion/compression ratio being smaller 
than 1 , In the FIG. 28 process; 
FIGS. 31 A to 31 C are diagrams showing examples 
of preceding vowel singing length calculation; 
FIG. 32 is a flowchart showing a vowel singing 
length-calculating process 

FIG. 33 is a diagram showing an example of vowel 
singing length calculation; 

FIG. 34 is a flowchart showing a transition track- 
forming process; 

FIGS. 35A to 35C are diagrams showing examples 
of calculation of transition time lengths NONEn and 

NONEs; 

FIGS. 36A to 36G are diagrams showing an exam- 
ple of calculation of transition time lengths pNONEn 
and NONEs; 

FIG. 37 is a flowchart showing a vibrato track-form- 
ing process; 

FIGS. 38A to 38E are diagrams showing examples 
of vibrato track formation; 

FIG. 39A to 39E show diagrams showing examples 
of variations of silence singing length calculation; 
and 

FIG. 40A and 40B show singing-starting timing of 
human singing, and singing-starting timing of sing- 
ing voices synthesized according to the prior art, re- 
spectively, for comparison. 
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DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 



[0030] The present Invention will now be descnbed in 
detail with reference to the drawings showing a pre- 
ferred embodiment thereof. 

[0031 ] Referring first to FIGS. 1 A and 1 B, the outline 
of a singing voice-synthesizing method according to an 
embodiment of the present invention will be described. 
FIG. 1A shows consonant singing-starting timing and 
vowel singing-starting timing of human singing, similarly 
to FIG. 40A, while FIG. 1 B shows singing-starting timing 
of singing voices synthesized by the singing voice-syn- 
thesizing method according to the present embodiment. 
[0032] In the present embodiment, performance data 
which is comprised of phonetic unit Information, singing- 
starting time information, and singing length information 
Is inputted for each of phonetic units which constitute a 
lyric such as "saita", each phonetic unit consisting of 
"sa", "i", or "ta". The singing-starting time information 
represents an actual singing-starting time point (e.g. 
timingof afirstbeatofatime), suchasTI shown In FIG. 
IB. Each performance data Is inputted in timing earlier 
than the actual singing-starting time point, and has its 
phonetic unit Information converted to a phonetic unit 
transition time length. The phonetic unit transition time 
length consists of a first phoneme generation time 
length and a second phoneme generation time length, 
for a phonetic unit, e.g. "sa", formed by a first phoneme 
("s") and a second phoneme ("a"). This phonetic unit 
transition time, the singing-starting time infonnation, 
and the singing length information are used to determine 
the respective singing-starting time points of the first 
and second phonemes and the respective singing du- 
ration times of the first and second phonemes. At this 
time, the singing-starting time point of the consonant "s" 
is set to be earlier than the actual singing-starting time 
point T1 . This also applies to the phonetic unit "ta". The 
singing-starting time point of the vowel "a" Is set equal 
to or earlier or later than the actual singing-starting time 
point T1 . This also applies to the phonetic units "i" and 
"ta". In the FIG. IB example, for the phonetic unit "sa", 
the singing-starting time point of the consonant "s" is set 
earlier than the actual singing-starting time point T1 so 
as to be adapted to the FIG. 1 A case of human singing, 
and the singing-starting time point of the vowel "a" Is set 
equal to the actual singing-starting lime point T1 ; for the 
phonetic unit "1", the singing-starting time point thereof 
is set to the actual singing-starting time point T2; and 
for the phonetic unit "ta", the singing-starting time point 
of the consonant T is set earlier than the actual singing- 
starting time point T3 so as to be adapted to the FIG. 
1 A case of human singing, and the singing-starting time 
point of the vowel "a" Is set equal to the actual singing- 
starting time point T3. 

[0033] In the singing voice synthesis, the consonant 
"s" starts to be generated at the determined singing- 
starting time point and continues to be generated over 



the detennined singing duration time. This also applies 
to the phonetic units "i" and "ta". As a result, the singing 
voices synthesized by the present method become very 
natural In which the singing-starting time points and the 

s singing duration times thereof are approximate to those 
of the FIG. 1 A case of human singing. 
[0034] FIG. 2 shows the circuit configuration of a sing- 
ing voice-synthesizing apparatus according to an em- 
bodiment of the present invention. This singing voice- 

10 synthesizing apparatus has its operation controlled by 
a small-sized computer. 

[0035] The singing voice-synthesizing apparatus is 
comprised of a CPU (Central Processing Unit) 12. a 
ROM (Read Only Memory) 1 4, a RAM (Random Access 
15 Memory) 16, a detection circuit 20, a display circuit 22, 
an external storage device 24, a timer 26, a tone gen- 
erator circuit 28, and a MIDI (Musical Instrument Digital 
Interface) interface 30, all connected to each other via 
a bus 10. 

20 [0036] The CPU 12 performs operations of various 
processes conceming the generation of musical tones, 
the synthesis of singing voices, etc. according to pro- 
grams stored in the ROM 14. The process concerning 
the synthesis of singing voices (singing voice-synthesiz- 

25 ing process) will be described in detail hereinafter with 
reference to flowcharts shown in FIG. 17 etc. 
[0037] The RAM 1 6 Includes various storage sections 
used as working areas for processing operations of the 
CPU 12, and is provided with a receiving buffer in which 

30 received performance data are written, etc. as a storage 
section related to the execution of the present invention. 
[0038] The detection circuit 20 detects operating in- 
formation concerning operations of various operating el- 
ements of an operating element group 34 arranged on 

35 a panel, not shown. 

[0039] The display circuit 22 controls the operation of 
a display 36 to thereby enable various images to be dis- 
played thereon. 

[0040] The external storage device 24 is comprised 
40 of a drive in which at least one type of storage medium, 
e.g. a HD (hard disk), an FD (floppy disk), a CD (com- 
pact disk), a DVD (digital versatile disk), and an MO 
(magneto-optical disk) can be removably mounted. 
When a desired storage medium is mounted in the ex- 
45 ternal storage device 24, data can be transferred from 
the storage medium to the RAM 16. Further, when the 
storage medium is a writable one, such as a HD and an 
FD, data can be transferred from the RAM 1 6 to the stor- 
age medium. 

50 [0041] As program-recording means, there may be 
employed a storage medium mounted in the external 
storage section 24 instead of the ROM 14. In this case, 
a program stored in the storage medium is transferred 
from the storage medium 24 to the RAM 1 6. Then, the 

55 CPU 12 is operated according to the program stored in 
the RAM 16. This makes it possible to add a program 
or upgrade the same, with ease. 
[0042] The timer 26 generates a tempo dock signal 
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TCL having a repetition period corresponding to atempo 
designated by tempo data TM, and the tempo clock sig- 
nal TCL is supplied to the CPU 12 as an interrupt com- 
mand. The CPU 1 2 carries out the singing voice synthe- 
sis by executing an interrupt-handling process in re- 
sponse to the tempo clock signal TCL. The tempo des- 
ignated by the tempo data TM can be varied according 
to the operation of a tempo-setting operating element of 
the operating element group 34. The repetition period 
of generation of the tempo clock signal TCL can be set 
e.g. to 5 ms. 

[0043] The tone generator circuit 28 Includes a large 
number of tone-generating channels and a large 
number of singing voice-synthesizing channels. The 
singing voice-synthesizing channels synthesize singing 
voices based on a formant-synthesizing method. In the 
singing voice-synthesizing process, described hereinaf- 
ter, singing voice signals are generated from the respec- 
tive singing voice-synthesizing channels. The thus gen- 
erated tone signals and/or singing voice signals are con- 
verted to sound or acoustic waves by a sound system 
38. 

[0044] The MIDI interface 30 Is provided for MIDI 
communication between the present singing voice-syn- 
thesizing apparatus and an MIDI apparatus 39 provided 
as a separate unit. In the present embodiment, the MIDI 
interface 30 is used for receiving performance data from 
the MIDj apparatus 39, so as to synthesize singing voic- 
es. The. singing voice-synthesizing apparatus may be 
configured such that performance data for accompani- 
ment for singing maybe received together with perform- 
ance data for the singing voice synthesis from the MIDI 
apparatus 39, and the tone generator circuit 28 gener- 
ates musical tone signals for the accompaniment based 
on the performance data for the accompaniment of sing- 
ing, so that the sound system 38 generates accompa- 
niment sounds. 

[0045] Next, the outline of the singing voice-synthe- 
sizing process carried out by the singing voice-synthe- 
sizing apparatus according to the present embodiment 
will be described with reference to FIG. 3. In a step S40, 
perfonnance data is inputted. More specifically, the per- 
formance data is received from the MIDI apparatus 39 
via the MIDI interface 30. The details of the performance 
data will be described hereinafter with reference to FIG. 
4. 

[0046] In a step S42, based on each received per- 
formance data, a phonetic unit transition time length and 
a state transition time length are retrieved from a pho- 
netic unit transition DB (database) 14b and a state tran- 
sition DB (database) 14c within a singing voice synthe- 
sis DB (database) 14. Based on the phonetic unit tran- 
sition time length, the state transition time length and 
the performance data, a singing voice synthesis score 
is formed. The singing voice synthesis score is com- 
prised of three tracks of a phonetic unit track, a transition 
track, and a vibrato track. The phonetic unit track con- 
tains information of singing-starting tinne points, singing 



duration times, etc., the transition track contains infor- 
mation of starting time points and duration times of tran- 
sition states, such as attack, and the vibrato track con- 
tains information of starting time points and duration 

5 times of a vibrato-added state, and the like. 

[0047] In a step S44, the singing voice synthesis is 
performed by a singing voice-synthesizing engine. More 
particularly, the singing voice synthesis Is carried out 
based on the perfonnance data inputted in the step S40, 

10 the singing voice synthesis scores fonned in the step 
S42, and tone generator control information retrieved 
from the phonetic unit DB 14a. the phonetic unit transi- 
tion DB 14b, the state transition DB 1 4c and the vibrato 
DB 14d, whereby singing voice signals are generated 

'5 in the order of voices to be sung. In the singing voice- 
synthesizing process, a singing voice formed by a single 
phonetic unit (e.g. "a") designated by the phonetic unit 
track or a transitional phonetic unit (e.g. "sa" In which 
transition from "s" to "a" occurs) and at the same time 

20 having pitch designated by the perfonnance data starts 
to be generated at a singing-starting time point desig- 
nated by the phonetic unit track and continues to be gen- 
erated over a singing duration time designated by the 
phonetic unit track. 

25 [0048] To the singing voice thus generated, minute 
changes in pitch, amplitude and the like can be added 
at and after the starting time of a transition state, such 
as attack, designated by the transition track, and the 
state in which such changes are added to the singing 

30 voice can be continued over a duration time of the tran- 
sition state, such as attack, designated by the transition 
track. Further to the singing voice, a vibrato can be add- 
ed at and after a starting time designated by the vibrato 
track and the state in which the vibrato is added to the 

35 singing voice can be continued over a duration time des- 
ignated by the vibrato track. 

[0049] In steps S46 and S48, processes are carried 
out within the tone generator circuit 28. In the step S46, 
the singing voice signal is subjected to D/A (digital-to- 
40 analog) conversion, and in the step S48, the singing 
voice signal subjected to the D/A conversion is output- 
ted to the sound system 38 to cause the same to be 
sounded as a singing voice. 

[0050] FIG . 4 shows information contained in the per- 
-^5 formance data. The performance data contains per- 
formance information necessary for singing one sylla- 
ble, and the performance information contains note in- 
formation, phonetic unit track information, transition 
truck information, and vibrato track information. 
50 [0051] The note information contains note-on infor- 
mation indicative of an actual singing-starting time point, 
duration information Indicative of actual singing length, 
and pitch information indicative of the pitch of singing 
voice. The phonetic unit track information contains in- 
55 formation of a singing phonetic unit (denoted by PhU), 
consonant modification information representative of a 
singing consonant expansion/compression ratio, etc. In 
the present embodiment, it is assumed that the singing 
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voice synthesis is carried out to synthesize singing voic- 
es of a Japanese-language song, and hence the pho- 
nemes appearing in the singing voices are consonants 
and vowels, and further, the phonetic unit state (PhU 
State) can be a connbination of a consonant and a vowel, s 
a vowel alone, or a voiced consonant (nasal sound, half 
vowel) alone. If the phonetic unit state is the voiced con- 
sonant alone, the singing-starting time point of the 
voiced consonant is simllarto that of a vowel alone case, 
and hence the phonetic unit state Is handled as the vow- 10 
el alone. 

[0052] The transition track Information contains attack 
type information indicative of a singing attack type, at- 
tack rate Information Indicative of a singing attack ex- 
pansion/compression ratio, release type Information in- 15 
dicative of a singing release type, release rate infonna- 
tlon Indicative of a singing release expansion/compres- 
sion ratio, note transition type Information indicative of 
a singing note transition type, etc. The attack type des- 
ignated by the attacktype information includes "normal", 20 
"sexy", "sharp", "soft", etc. The release type information 
and the note transition type information can also desig- 
nate one of a plurality of types, simllarto the attacktype. 
The note transition means a transition from the present 
performance data (performance event) to the next per- 25 
formance data (perfonnance event). The singing attack 
expansion/compression ratio, the singing release ex- 
pansion/compression ratio, and the note transition ex- 
pansion/compression ratio are each set to a value larger 
than 1 when the state transition time length associated 30 
therewith is desired to be increased, and to a value 
smaller than 1 when the same Is desired to be de- 
creased. These ratios can be also set to 1 , and In this 
case, addition of minute changes in pitch, amplitude and 
the like accompanying the attack, release and note tran- 35 
sition is not carried out. 

[0053] The vibrato track information contains Informa- 
tion of a vibrato number Indicative of the number of vi- 
brato events In the present performance data, infonna- 
tion of vibrato delay 1 Indicative of a delay time of a first 40 
vibrato, information of vibrato duration 1 indicative of a 
duration time of the first vibrato, information of vibrato 
delay K indicative of a delay time of a K-th vibrato, where 
K is equal to or larger than 2, infomnation of vibrato du- 
ration K indicative of a duration time of the K-th vibrato, 45 
and information of vibrato type K Indicative of a type of 
the K-lh vibrato. When the number of vibrato events Is 
0, the information of vibrato delay 1 , et seq. are not con- 
tained in the vibrato track Information. The vibrato type 
designated by the information of vibrato type 1 to vibrato so 
type K Includes "normal", "sexy", and "enka (Japanese 
traditional popular song)". 

[0054] Although the singing voice synthesis DB 14A 
shown in FIG. 3 Is provided within the ROM 14 in the 
present embodiment, this is not limitative, but the same 55 
may be provided in the external storage device 24 and 
transferred therefrom when it is used. Within the singing 
voice synthesis DB 1 4A , there are provided the phonetic 
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unit DB 14a, the phonetic unit transition DB 14b, the 
state transition DB 14c, the vibrato DB 14d, , another 
DB 14n. 

[0055] Next, the infomiation stored in the phonetic 
unit DB 14a, the phonetic unit transition DB 14b, the 
state transition DB 14c, and the vibrato DB 14d will be 
described with reference to FIGS. 5 to 8. The phonetic 
unit DB 1 4a and the vibrato DB 1 4d store tone generator 
control information as shown in FIGS. 5 and 8, respec- 
tively The phonetic unit transition DB 14b stores pho- 
netic unit transition time lengths and tone generator con- 
trol Infomnation, as shown in FIG. 6B, and the state tran- 
sition DB 14c stores state transition time lengths and 
tone generator control information, as shown in FIG. 7. 
When such storage information is prepared, singing 
voices of a singer are analyzed to detemnine tone gen- 
erator control Information, phonetic unit transition time 
lengths and state transition time lengths. Further, as to 
the types of "normal", "sexy", "soft", "enka", etc., singing 
voices are recorded by asking the singer to sing the 
song with the same type of tinged sound (e.g. by asking 
"Please sing by adding a sexy attack." or "Please sing 
by adding enka-tinged vibrato.), and the recorded sing- 
ing voices are analyzed to determine the tone genera- 
tion control Information, the phonetb unit transition time 
lengths, the state transition time lengths for the specific 
type. The tone generator control Information Is com- 
prised of fomiant frequency and control parameters of 
afomnant level necessary for synthesizing desired sing- 
ing voices. 

[0056] The phonetic unit DB 14a shown in FIG. 5 
stores tone generator control Information for each pitch, 
such as "PI" and "P2" within each phonetic unit, such 
as "a", "i", "M", and "Sil". In FIGS. 5 to 8 and the following 
description, the symbol "M" represents a phonetic unit 
"u", and "Sir represents silence. During the singing 
voice synthesis, the tone generator control information 
adapted to the phonetic unit and pitch of a singing voice 
to be synthesized is selected from the phonetic unit DB 
14a. 

[0057] FIG. 6A shows phonetic unit transition time 
lengths (a) to (f) stored in the phonetic unit transition DB 
14b. In FIGS. 6A and the following description, the sym- 
bols "V_Sir' etc. represent the following: 

(a) "V_Sil" represents a phonetic unit transition from 
a vowel to silence, and, for example, in FIG. 6B, cor- 
responds to a combination of the preceding vowel 
"a" and the following phonetic unit "Sil". 

(b) "Sil_C" represents a phonetic unit transition from 
silence to a constant, and, for example, in FIG. 6B, 
corresponds to a combination of the preceding pho- 
netic unit "Sil" and the following consonant "s", not 
shown. 

(c) "C_V" represents a phonetic unit transition from 
a constant to a vowel, and, for example, in FIG. 6B, 
corresponds to a combination of the preceding con- 
sonant "s". not shown, and the following vowel "a", 
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not shown. 

(d) "SiLV" represents a phonetic unit transition fronn 
silence to a vowel, and. for example, In FIG. 6B, cor- 
responds to a combination of the preceding phonet- 
ic unit "Sil" and the following vowel "a". 

(e) "pV_C" represents a phonetic unit transition 
from a preceding vowel to a constant, and, for ex- 
ample, in FIG. 6B, corresponds to a combination of 
the preceding vowel "a" and the following conso- 
nant "s", not shown. 

(f) "pV_V" represents a phonetic unit transition from 
a preceding vowel to a vowel, and, for example, in 
FIG. 6B, corresponds to a combination of the pre- 
ceding vowel "a" and the following vowel "i". 

[0058] The phonetic unit DB 14b shown in FIG. 6B 
stores a phonetic unit transition time length and tone 
generation control information for each pitch, such as 
"PI" and "P2" within each combination of phonetic units 
(i.e. transition in the phonetic units), such as "a" - "i". In 
FIG. 6B: "aspiration" represents a sound of aspiration. 
The phonetic unit transition time length consists of a 
combination of a time length of the preceding phonetic 
unit and a time length of the following phonetic unit, with 
the boundary between the two time lengths being held 
as time slot Information. When the singing voice synthe- 
sis score is formed, a phonetic unit transition time length 
suitable for the combination of phonetic units which 
should form the phonetic track and the pitch thereof is 
selected from the phonetic unit transition DB 14b. Fur- 
ther, during the singing voice synthesis, tone generator 
■ i- control information suitable for the combination of pho- 
units of a singing voice to be synthesized and the 
pitch thereof Is selected from the phonetic unit transition 
DB 14b. 

[0059] The state transition DB 14c shown in FIG. 7 
stores a state transition time length and tone generator 
control information for each pitch, such as "PI" and 
"P2", within each phonetic unit, such as "a" and "i", for 
each of the state types, I.e. "normal", "sexy", "sharp" and 
"soft", within each of the transition states, i.e. attack, 
note transition (denoted as "NtN") and release. The 
state transition time length corresponds to a duration 
time of a transition state, such as attack, note transition 
and release. When the singing voice synthesis score Is 
formed, a state transition time length suitable for the 
transition slate, transition track, transition type, phonetic 
unit, and pitch of a singing voice to be synthesized, 
which should form the transition track, is selected from 
the state transition DB 14c. 

[0060] The vibrato DB 1 4d shown in FIG. 8 stores tone 
generator control Information for each pitch, such as 
"P1 "and "P2", within each phonetic unit, such as "a" and 
"1", for each of the vibrato types, "normal", "sexy", r- and 
"enka". When the singing voice synthesis score is 
formed, the tone generator control information suitable 
for the vibrato type, phonetic unit, and pitch of a singing 
voice to be synthesized is selected from the vibrato DB 



14d. 

[0061] FIG. 9 illustrates a manner of singing voice 
synthesis based on performance data. Assuming that 
performance data S^, Sg, and S3 designates, similarly 
5 to FIG. IB. "sa: C3: Tl--" , "i: D3: T2.-.", and "ta: E3: 
T3-"", respectively, the performance data S.,, Sg, S3 are 
transmitted at respective time points ti , t2, is earllerthan 
the actual singing-starting time points T1, T2, T3, and 
received via the MIDI interface 30. The process of trans- 

10 mitting/receiving the perfonnance data corresponds to 
the process of inputting performance data In the step 
S40. Whenever each performance data is received, in 
the step S42, a singing voice synthesis score Is formed 
for the perfonnance data. 

15 [0062] Then, in the step S44, according to the formed 
singing voice synthesis scores, singing voices SS.,, 
SSg: SS3 are synthesized. As a result of the singing 
voice synthesis, it is possible to start generation of the 
consonant "s" of the singing voice SS^ at a time point 

20 earlier than the time point T1 , and further the vowel 
"a" of the singing voice SS^ at the time point TI . Also, 
It Is possible to start generation of the vowel "I" of the 
singing voice SSg at the time point T2. Further, it is pos- 
sible to start generation of the consonant "t" of the sing- 
es ing voice SS3 at a time point T31 earlier than the time 
point T3, and further the vowel "a" of the singing voice 
SS3 at the time point T3. If desired, it Is also possible to 
start generation of the vowel "a" of the phonetic unit "sa" 
or the vowel "i" of the phonetic unit "i" earlier than the 

30 respective time points TI and T2. 

[0063] FIG. 1 0 illustrates a procedure of generation of 
reference scores and singing voice synthesis scores in 
the step 842. In the present embodiment, a reference 
score-forming process is carried out as preprocessing 

35 prior to the singing voice synthesis score-forming proc- 
ess. More specifically, performance data transmitted at 
the time points t^, X2, are sequentially received and 
written Into the receiving buffer with In the RAM 16. From 
the receiving buffer, the performance data are trans- 

40 ferred to a storage section, referred to as "reference 
score", within the RAM 16, in the order of actual singing- 
starting time points designated by the perfonnance da- 
ta, and sequentially written thereinto, e.g. in the order 
of performance data S^, 82, S3. Then, singing voice syn- 

45 thesis scores are fonned in the order of actual singing- 
starting time points based on the performance data In 
the reference score. For example, based on the per- 
formance data S-,, a singing voice synthesis score 8C^ 
is formed, and based on the performance data 82, a 

50 singing voice synthesis score 8C2 is formed. Thereafter, 
as described hereinbefore with reference to FIG. 9, the 
singing voice synthesis Is carried out according to the 
singing voice synthesis scores SC^, SCg: ... 
[0064] The above description concerns the p rocesses 

55 of forming reference scores and singing voice synthesis 
scores when the transmission and reception of perform- 
ance data are carried out In the order of actual singing- 
starting time points. When the transmission and recep- 



9 



BNSDOCID: <EP 1220194A2J_> 



17 



EP 1 220 194 A2 



18 



tion of performance data are not carried out in the order 
of actual singing-starting time points, reference scores 
and singing voice synthesis scores are formed in man- 
ners as illustrated in FIGS. 1 1 and 12. More specifically, 
It is assumed that performance data S., , S3, S4 are trans- s 
mitted at respective time points t^ , t2, t3, and sequentially 
received, as shown in FIG. 11. Then, after the perform- 
ance data Si is written into the reference score, the per- 
fomiance data S3 and S4 are sequentially written there- 
into, and based on the performance data S^ , S3, singing 10 
voice synthesis scores SC^, SCgg are respectively 
formed. The writing of perfonnance data into the refer- 
ence score at a second or later time point will be referred 
to as "addition" if they are simply written into the refer- 
ence score In an adding fashion as illustrated in FIGS. i5 
10 and 11 , while the same will be referred to as "inser- 
tion" if they are written in an Inserting fashion as illus- 
trated in FIG. 12. Assuming that thereafter, at a time 
point 14, perfonnance data 83 is transmitted and re- 
ceived, as shown in FIG. 12, the performance data S2 ^0 
is added between the perfomiance data S^ and S3 within 
the reference score. The reference score(s) afterthe ac- 
tual singing-starting time point at which the insertion of 
performance data has occurred is/arc discarded, and 
based on the performance data thus updated afterthe 25 
actual singing-starting time point at which the insertion 
of performance data has occurred, new singing voice 
synthesis scores are formed. For example, the singing 
voice synthesis score SC^^ is discarded, and based on 
the perfonnance data Sg, S3, singing voice synthesis 30 
scores SCg, SCsb are formed, respectively. 
[0065] FIG. 13 shows an example of singing voice 
synthesis scores fomned based on performance data in 
the step S42, and an example of singing voices synthe- 
sized in the step S44. The singing voice synthesis 35 
scores SC arefonned within the RAM 16, and are each 
formed by a phonetic unit track Tp, a transition track Tr, 
and a vibrato track Tg. Data of singing voice synthesis 
scores SC are updated or added whenever perform- 
ance data Is received. 40 
[0066] Assuming, for example, that performance data 
S^, S2, and S3 designate, similarly to FIG. 1 B, "sa: C3: 
T1...", "i: D3: T2...", and "ta: E3: TS-.", respectively in- 
formation as shown in FIGS. 13 and 14 Is stored in a 
phonetic unit track Tp. More specifically items of infor- 45 
mation are arranged in the order of singing, i.e. silence 
(Sil), a transition (Sil_s) from the silence to a consonant 
"s", a transition (s_a) from the consonant "s" to a vowel 
"a", the vowel (a), etc. The information of silence Sil is 
comprised of items of information representative of a so 
starting time point (Begin Time = Til), a duration time 
(Duration = D11), and a phonetic unit (PhU = Sil). The 
information of the transition Sil_s is comprised of items 
of information representative of a starting time point (Be- 
gin Time =T1 2), a duration time (Duration = D12), a pre- 55 
ceding phonetic unit (PhUI = Sil) and the following pho- 
netic unit (PhU2 = s). The information of the transition 
s_a is comprised of items of infomiatlon representative 



of a starting time point (Begin Time = T13), a duration 
time (Duration = D13). the preceding phonetic unit 
(PhUI = s) and the following phonetic unit (PhU2 = a). 
The Information of the vowel a Is comprised of items of 
Information representative of a starting time point (Begin 
Time = T14), a duration time (Duration = D14), and a 
phonetic unit (PhU = a). 

[0067] The information of duration times of phonetic 
unit transitions, such as "Sll_a" and "s_a" is comprised 
of a combination of the time length of the preceding pho- 
netic unit and the time length of the following phonetic 
unit, with the boundary between the time lengths being 
held as time slot infomiatlon. Therefore, the time slot 
Information can be used to instruct the tone generator 
circuit 28 to operate according to the duration time of 
the preceding phonetic unit and the starting time point 
and duration time of the following phonetic unit. For ex- 
ample, based on the duration time information of the 
transition Sil_s, the circuit 28 can be instructed to oper- 
ate according to the duration time of silence and the 
singing-starting time point T^^ and singing duration time 
of the consonant "s", and based on the duration time 
information of the transition s_a, the circuit 28 can be 
instructed to operate according to the duration time of 
the consonant "a" and the singing-starting time point T1 
and singing duration time of the vowel "a". 
[0068] Information as shown in FIG. 13 and 15 is 
stored in the transition track Tr. More specifically, items 
of state information are arranged in the order of occur- 
rence of transition states, e.g. no transition state (denot- 
ed as NONE), an attack transition state (Attack), a note 
transition state (NtN), NONE, a release transition state 
(Release), NONE, etc. The state information in the tran- 
sition track Tr Is formed based on the performance data 
and infonnation in the phonetic unit track Tp. The state 
Information of the attack transition state Attack corre- 
sponds to the information of the phonetic unit transition 
from "s" to "a" In the phonetic unit track Tp, the state 
Information of the note transition state NtN to the infor- 
mation of the phonetic unit transition from "a" to "i", and 
the state information of the release transition state Re- 
lease to the information of the phonetic unit transition 
from "a" to "Sil" in the phonetic unit track Tp. Each state 
Information Is used for adding minute changes in pitch 
and amplitude, to a singing voice synthesized based on 
the information of a corresponding phonetic unit transi- 
tion.. Further, in the example of FIG. 13, the stale infor- 
mation of NtN corresponding to the phonetic unit tran- 
sition from "t" to "a" is not provided. 
[0069] As shown in FIG. 15, the state infonnation of 
the first no transition state NONE is comprised of items 
of information representative of a starting time point (Be- 
gin Time = T21), a duration time (Duration = D21), and 
atransltlon index (Index = NONE). The state Information 
of the attack transition state Attack Is comprised of items 
of information representative of a starting time point (Be- 
gin Time =T22), a duration time (Duration = D22), a tran- 
sition index (Index = Attack), and the type of the transi- 



10 

BNSDOCIO: <EP 1220194A2J_> 



19 

tion index (e.g. "normal", Type = Type22). The transition 
information of the second no transition state NONE is 
the same as that of the first no transition state NONE 
except that the starting time point and the duration time 
are T23 and D23, respectively. The state information of 5 
the note transition state NtN is comprised of items of 
information representative of a starting time point (Begin 
Time = T24), a duration time (Duration = D24), a transi- 
tion index (Index = NtN), and the type of the transition 
index (e.g. "normal", Type = Type24). The state infor- 10 
mation of the third no transition state NONE is the same 
as that of the first no transition state NONE except that 
the starting time point and the duration time are T25 and 
D25, respectively. The state Information of the release 
transition state Release is comprised of respective 15 
items of information representative of a starting time 
point (Begin Time = T26), a duration time (Duration = 
D26), a transition index (Index = Release), and the type 
of the transition index (e.g. "normal", Type = Type26). 
[0070] Infonnation as shown in FIGS. 13 and 16 Is 20 
stored in the vibrato track Tg. More specifically, items of 
the Information are arranged in the order of occurrence 
of vibrato events, e.g. vibrato off, vibrato on, vibrato off, 
and so forth. The information of a first vibrato off event 
Is comprised of items of Information representative of a 25 
starting time point (Begin Time = T31), a duration time 
(Duration = D31), and a transition index (Index = OFF). 
The information of a vibrato on event Is comprised of 
items of information representative of a starting time 
point (Begin Time = T32), a duration time (Duration = 30 
D32), a transition index (Index = ON), and the type of 
the vibrato (e.g. "normal", Type = Type32). The infomna- 
tion of ;a second vibrato off event is the same as that of 
the first one except that the starting time point and the 
duration time are T33 and D33, respectively. 35 
[0071] The information of the vibrato on event corre- 
sponds to the Infonnation of the vowel "a" of the phonetic 
unit "ta" in the phonetic unit track Tp, and Is used for 
adding vibrato-like changes In pitch and amplitude to a 
singing voice synthesized based on the Information of 40 
the vowel "a". In the information of the vibrato on event, 
by setting the starting time point later than the starting 
time point T3 at which the singing voice "a" is to start 
being generated, by a delay time DL, a delayed vibrato 
can be realized. It should be noted that starting time 
points T11 to T14, T21 to T26, T31 to T33, etc., and du- 
ration limes D11 to D14, D21 to D26, D31 to D33, etc. 
can be set as desired by using the number of clocks of 
the tempo clock signal TCL. 

[0072] By using the singing voice synthesis score SO 50 
and the performance data Si to S3, the singing voice- 
synthesizing process in the step S44 can synthesize the 
singing voice as shown In FIG. 1 3. After realizing silence 
time before starting the singing based on the information 
of silence Sll In the phonetic unit track Tp, the tone gen- ss 
erator control information corresponding to the informa- 
tion of the transition Sil_s In the track Tp and the pitch 
infonnation of C3 in the perfonnance data S^ is read out 
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from the phonetic unit transition DB 14b shown in FIG. 
6B to control the tone generator circuit 28, whereby the 
consonant "s" starts to be generated at the time point 
T11. The control time period at this time corresponds to 
the duration time designated by the Information of the 
transition SiLs in the track Tp. Then, the tone generator 
control infomnatlon corresponding to the Information of 
the transition s_a in the track Tp and the pitch informa- 
tion of C3 in the performance data 8^ Is read out from 
the DB 14b to control the tone generator circuit 28, 
whereby the vowel "a" starts to be generated at the time 
point T1 . The control time period at this time corre- 
sponds to the duration time designated by the Informa- 
tion of the transition s__a in the track Tp. As a result, the 
phonetic unit "sa" is generated as the singing voice SS^ . 
[0073] Following this, the tone generator control infor- 
mation corresponding to the information of the vowel "a" 
in the track Tp and the pitch information of C3 In the per- 
formance data S-i Is read out from Ihe phonetic unit DB 
1 4a to control the tone generator circuit 28, whereby the 
vowel "a" continues to be generated. The control time 
period at this time corresponds to the duration time des- 
ignated by the information of the vowel "a" In the track 
Tp. Then, the tone generator control Information corre- 
sponding to the infomnation of the transition aj in the 
track Tp and the pitch information of D3 in the perform- 
ance data S2 is read out from the DB 14b to control the 
tone generator circuit 28, whereby the generation of the 
vowel "a" is stopped and at the same time the generation 
of the vowel "I" Is started at the time point T2. The control 
time period at this time con'esponds to the duration time 
designated by the information of the transition "a_i" In 
the track Tp. 

[0074] Following this, similarly to the above, the tone 
generator control Infonnation corresponding to the In- 
formation of the vowel "i" and the pitch information of D3 
and one corresponding to the infonnation of a transition 
i_t In the track Tp and the pitch Infonnation of D3 are 
sequentially read out to control the tone generator circuit 
28, whereby the generation of the vowel "I" is continued 
until the time point T31, and at this time point T3^, the 
generation of the consonant "t" is started. Then, after 
starting the generation of the vowel "a" at the time point 
T3, based on the tone generator control information cor- 
responding to the information of the transition t_a and 
the pitch information of E3, the tone generator control 
information corresponding to the information of the vow- 
el a In the track Tp and the pitch information of E3 and 
one corresponding to the information of the transition 
a_Sil in the track Tp and the pitch information of E3 are 
sequentially read out to control the tone generator circuit 
28, whereby the generation of the vowel "a" is continued 
until the time point T4, and at this time point T4, the state 
of silence is started. As a result, as the singing voices 
SS2: SS3, the phonetic units "i" and "ta" are sequentially 
generated. 

[0075] In accordance with the generation of the sing- 
ing voices as described above, the singing voice control 
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is carried out based on the information in the perform- 
ance data Si to S3 and the information In the transition 
track Tr. More specifically, before and after the time 
point T1, the tone generator control infonnatlon corre- 
sponding to the state infomiation of the transition sate 
Attack in the track Tr and the Information of the transi- 
tion s_a in the track Tp are read out from the state tran- 
sition DB 1 4c In FIG. 7 to control the tone generator cir- 
cuit 28, whereby minute changes in pitch, amplitude, 
and the like are added to the singing voice "s_a". The 
control time period at this time corresponds to the dura- 
tion time designated by the state information of the at- 
tack transition state Attack. Further, before and after the 
timepolntT2, thetone generator controlinfomnation cor- 
responding to the state information of the note transition 
state NtN in the track Tr and the information of the tran- 
sition aj in the track Tp, and the pitch infonnatlon D3 in 
the perfomnance data 82 Is read out from the DB 14c to 
control the lone generator circuit 28, whereby minute 
changes in pitch, amplitude, and the like are added to 
the singing voice "aJ". The control time period at this 
time corresponds to the duration time designated by the 
state Information of the note transition state NtN. Fur- 
ther, immediately before the time point T4, thetone gen- 
erator control Information corresponding to the state in- 
formation of the release transition state Release in the 
track Tp and the information of the vowel a in the track 
Tp, and the pitch information E3 in the performance data 
S3 is read out from the DB 1 4c to control the tone gen- 
erator circuit 28, whereby minute changes in pitch, am- 
plitude, and the like are added to the singing voice "a". 
The control time period at this time corresponds to the 
duration time designated by the state information of the 
release transition state Release. According to the sing- 
ing voice control described above, it Is possible to syn- 
thesize natural singing voices with the feelings of attack, 
note transition, and release. 

[0076] Further, in accordance with generation of the 
singing voices described above, the singing voice con- 
trol is carried out based on the Information of the per- 
formance data S^ to S3, and the information in the vi- 
brato track Tg. More specifically, at a time later than the 
time point T3 by the delay time DL, the tone generator 
control infonnatlon corresponding to the information of 
a vibrato on event In the track Tg, the information of the 
vowel a in the track Tp, and the pitch Information of E3 
in the performance data S3 is read out from the vibrato 
DB 14d shown in FIG. 8 to control the tone generator 
circuit 28, whereby vIbrato-IIke changes in pitch, ampli- 
tude and the like are added to the singing voice "a", and 
such addition is continued until the time point T4. The 
control time period at this time corresponds to the dura- 
tion time designated by the information of the vibrato on 
event In the track Tg. Further, the depth and speed of 
vibrato are determined by the infonnation of the vibrato 
type in the performance data S3. According to the sing- 
ing voice control described above, it is possible to syn- 
thesize natural singing voices by adding vibrato to de- 



sired portions of the singing. 

[0077] Next, the performance data- receiving and 
singing voice synthesis score-forming process will be 
described with reference to FIG. 17. 

5 [0078] In a step S50, the initialization of the system is 
carried out, whereby, for example, the count n of a re- 
ception counter in the RAM 16 is set to 0. 
[0079] In a step S52, the count n of the reception 
counter is incremented by 1 (n = n + 1). Then, In a step 

10 SS4, a variable m is set to the value or count n of the 
counter, and performance data at an m-th (m = n) posi- 
tion in the sequence of performance data (hereinafter 
simply refereed to as the "m-th performance data") is 
received and written Into the receiving buffer in the RAM 

15 16. 

[0080] In a step S56, It is detennined whether or not 
the m-th (m = n) perfonnance data is at the end of the 
data. I.e. the last data. If first (m = 1) data is received in 
the step S54, the answer to the question of the step S56 

^0 becomes negative (N), and hence the process proceeds 
to a step 858. In the step S58, m-th (m = n) performance 
data Is read out from the receiving buffer and written Into 
the reference score in the RAM 16. It should be noted 
that once the first (m = 1) perfonnance data has been 

?5 written into the reference score, subsequent perform- 
ance data are either added to or Inserted into the refer- 
ence score, as described hereinabove with reference to 
FIGS. 10 to 12. 

[0081] Then, In a step S60, it is detemnined whether 

^0 or not n > 1 holds. If the first (m = 1) performance data 
has been received, the answer to the question of the 
step S60 becomes negative (N), so that the process re- 
turns to the step S52, wherein the count n is increment- 
ed to 2, and in the following step 854, second (m = 2) 

f5 performance data is received and written into the receiv- 
ing buffer. Then, the process proceeds via the step 56 
to the step 858, wherein the second (m = 2) perform- 
ance data is added to the reference score. 
[0082] Then, it is determined in the step S60 whether 

0 or not n > 1 holds, and In the present case, since the 
count n is equal to 2, the answer to this question be- 
comes afflnnative (Y), so that the singing voice synthe- 
sis score-forming process is carried out in a step S61 . 
Although the process in the step 861 will be described 

5 In detail with reference to FIG. 18, the outline thereof 
can be described as follows: It Is detennined in a step 
862 whether or not m-th (m = n -1) performance data 
has been inserted into the reference score. For exam- 
ple, since the m-th (m = 1) performance data has not 

' been inserted but simply written into the reference 
score, the answer to the question of the step 862 be- 
comes negative (N), so that the process proceeds to a 
step S64, wherein a singing voice synthesis score is 
formed concerning the m-th (m = n - 1 ) performance da- 

> ta. For example, when the second (m = 2) performance 
data is received in the step 854, a singing voice synthe- 
sis score Is formed concerning the first (m = 1 ) perform- 
ance data In the step 864. 
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[0083] After the processing in the step S64 is com- 
pleted, the process returns to the step S52, wherein sim- 
ilarly to the above, the reception of performance data 
and writing of the received performance data into the 
reference score are carried out. For example, after form- 
ing the singing voice synthesis score is formed concern- 
ing the first (m = 1 ) performance data in the step S64, 
third (m = 3) performance data is received in the step 
S54, and in the step S58, this data Is added to or Inserted 
into the reference score. 

[0084] If the answer to the question of the step S62 is 
affimiative (Y), this means that m-th (m = n - 1 ) perform- 
ance data has been inserted into the reference score, 
so that the process proceeds to a step S66, wherein 
singing voice synthesis scores whose actual singing- 
starting time points are later than that of the m-th (rn = 
n - 1 ) performance data are discarded, and singing voice 
synthesis scores are newly fomried conceming the m-th 
(m = n - 1 ) data and performance data subsequent there- 
to in the reference score. For example, assuming that 
after receiving performance data S^, S3, S4, as shown 
in FIGS. 11 and 12, perfomiance data S2 is received, 
the m-th (m = 4) performance data 82 is added to the 
reference score In the step S58. Then, the process pro- 
ceeds via the step 860 to the step 862, and since the 
_ . third (m = 4 - 1 = 3) perfomnance data S4 has been added 
to the reference score, the answer to the question of the 
step S62 becomes negative (N), so that the process re- 
turns via the step 864 to the step 52. Then, after receiv- 
ing fifth (m = 5) performance data in the step 854, the 
process proceeds via the steps 856, 858, 860 to the 
.v^ step S62, wherein since the fourth (m = 4) performance 
o,. data 84 has been inserted into the reference score, the 
answer to the question of this step becomes affirmative 
(Y), so that the process proceeds to the step 866, 
wherein singing voice synthesis scores (SCa^ etc. in 
FIG. 12) whose actual singing-starting time points are 
later than that of the fourth (m = 4) performance data 
are discarded, and singing voice synthesis scores are 
newly formed concerning the fourth (m = 4) performance 
data and subsequent performance data in the reference 
score (S2, S3, S4 in FIG, 12). 

[0085] After the processing in the step S66 is com- 
pleted, the process returns to the step 852, the process- 
ing similar to the above is repeatedly carried out. When 
the m-th (m = n) performance data Is at the end of the 
data, the answer to the question of the step S56 be- 
comes affinmative (Y), and in a step 868, a terminating 
process (e.g. addition of end information) is carried out. 
The execution of the step 868 is followed by the singing 
voice-synthesizing process being carried out in the step 
S44 in FIG. 3. 

[0086] FIG. 18 shows the singing voice synthesis 
score-forming process. First, in a step 870, perform- 
ance data containing performance information shown in 
FIG. 4 is obtained from the reference score. In a step 
872, the performance infomnation contained in the ob- 
tained performance data is analyzed. In a step S74, 
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based on the analyzed performance infomnation and the 
stored management data (management data of preced- 
ing performance data), management data for fonning 
the singing voice synthesis score is prepared. The 

5 processing in the step 874 will be described In detail 
hereinafter with reference to FIG. 19. 
[0087] Then, in a step S76, it is determined whether 
or not the obtained performance data has been inserted 
into the reference score when it has been written into 

10 the reference score. If the answer to this question is af- 
firmative (Y), in a step 878, singing voice synthesis 
scores whose actual singing-starting time points are lat- 
er than that of the obtained performance data are dis- 
carded. 

15 [0088] When the processing in the step 878 Is com- 
pleted or if the answer to the question of the step 876 
is negative (N), the process proceeds to a step S80, 
wherein a phonetic unit track-fomning process is carried 
out. This process in the step 880 fonns a phonetic unit 
20 track Tp based on performance data, the management 
data formed in the step S74, and the stored score data 
(score data of the preceding performance data). The de- 
tails of the process will be described hereinafter with ref- 
erence to FIG. 22. 
25 [0089] In a step 882, a transition track Tr is fomried 
based on the performance information, the manage- 
ment data formed in the step S74, the stored score data, 
and the phonetic unit track Tp. The details of the process 
in the step 882 will be described hereinafter with refer- 
ee ence to FIG. 34. 

[0090] In a step 884, a vibrato track Tq is formed 
based on the perfomnance information, the manage- 
ment data formed in the step 874, the stored score data, 
and the phonetic unit track Tp. The details of the process 
35 in the step 884 will be described hereinafter with refer- 
ence to FIG. 37. 

[0091 ] In a step 886, score data for the next perform- 
ance data is formed based on the performance informa- 
tion, the management data formed in the step 874, the 

40 phonetic unit track Tp, the transition track Tp, and the 
vibrato track Tg, and stored. The score data contains an 
NtN transition time length from the preceding vowel. As 
shown in FIG. 36, the NtN transition time length consists 
of a combination of a time length T., of the preceding 

45 note (preceding vowel) and a time length Tg of the fol- 
lowing note (present performance data), with the bound- 
ary between the two time lengths being held as time slot 
information. To calculate the NtN transition time length, 
the state transition time length of the note transition 

so state NtN corresponding to phonetic units, pitch, and a 
note transition type (e.g. "normal") in the performance 
information is read from the state transition DB 14c 
shown in FIG. 7, and this state transition time length is 
multiplied by the singing note transition expansion/com- 

55 pression ratio in the performance data. The NtN transi- 
tion time length obtained as the result of multiplication 
is used as the duration time information in the state in- 
formation of note transition state NtN, shown in FIGS. 
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13 and 15. 

[0092] FIG. 1 9 shows the management data-fomiing 
process. The management data Includes, as shown in 
FIGS. 20 and 21 , items of infonnation of a phonetic unit 
state (PhU state), a phoneme, pitch, current note on, 
current note duration, current note off, full duration, and 
an event state. 

[0093] When the performance data is obtained in a 
step S90, at the following step S92, the singing phonetic 
unit in the perfonnance data is analyzed. The infonna- 
tion of a phonetic unit state represents a combination of 
a consonant and a vowel, a vowel alone, or a voiced 
consonant alone. In the following, for convenience, the 
combination of a consonant and a vowel will be referred 
to as PhU State = Consonant VoweN and the vowel 
alone or the voiced consonant alone as PhU State = 
Vowel. The information of a phoneme represents the 
name of a phoneme (name of a consonant and/or name 
of a vowel), the category of the consonant (nasal sound, 
plosive sound, half vowel, etc.), whether the consonant 
is voiced or unvoiced, and so forth. 
[0094] In a step S94, the pitch of a singing voice In 
the perfonnance data is analyzed, and the analyzed 
pitch of the singing voice is set as the pitch information 
"Pitch". In a step S96, the actual singing time in the per- 
fornnance data Is analyzed, and the actual singing-start- 
ing time point of the analyzed actual singing time is set 
as the current note-on Infonnation "Current Note On". 
Further, the actual singing length is set as the cun-ent 
note duration Infonnation "Current Note Duration", and 
a time point later than the actual singing-starting time 
point by the actual singing length Is set as the current 
note-off Infonnation "Current Note Off". 
[0095] As the cun-ent note-on Information, the time 
point obtained by modifying the actual singing-starting 
time point may be employed. For example, a time point 
(to ± A t, where to indicates the actual singing-starting 
time point) obtained by randomly changing the actual 
singing-starting time point through a random number- 
generating process or the like, by Ai within a predeter- 
mined time range (indicated by two broken lines in FIGS. 
20 and 21) before and after the actual singing-starting 
time point (indicated by a solid line in FIGS. 20 and 21) 
may be set as the current note-on information. 
[0096] In a step 598, by using the management data 
of preceding performance data, the singing time points 
of the present performance data are analyzed. In the 
• j»v management data of the preceding perfonnance data, 
the information " Preceding Event Number" represents 
the number of preceding perfonnance data received, of 
which the rearrangement has been completed. The data 
"Preceding Score Data" is score data formed and stored 
in the step S86 when a singing voice synthesis score 
was fomried concerning the preceding performance da- 
ta. The infonnation "Preceding Note Off" represents a 
time point at which the preceding actual singing should 
be temninated. The Information "Event State" represents 
a state of connection (whether silence is Interposed) be- 



tween a preceding singing event and a current singing 
event determined based on the information "Preceding 
Note Off" and the current note-on information. In the fol- 
lowing, for convenience, a state In which the current 
singing event Is continuous from the preceding singing 
event (i.e. without silence), as shown in FIG. 20, will be 
Indicated by Event State = Transition, and a state in 
which silence Is interposed between the preceding sing- 
ing event and the current singing event, as shown in 
FIG. 21 , will be indicated by Event State = Attack. The 
infonnation "Full Duration" represents a time length be- 
tween a time point designated by the infonnation "Pre- 
ceding Note Off" at which the preceding actual singing 
should be tenninated and a time designated by the cur- 
rent note-off Information "Current Note Off" at which the 
current actual singing should be terminated. 
[0097] Next, the phonetic unit track-forming process 
will be described with reference to FIG. 22. In a step 
SI 00, performance infonnallon (contents of perform- 
ance data), the management data and the score data 
are obtained. In a step SI 02, a phonetic unit transition 
time length is obtained (read out) from the phonetic unit 
transition DB 14b shown in FIG. 6B based on the ob- 
tained data. The details of the processing in the step 
S1 02 will be described hereinafter with reference to FIG 
23. 

[0098] In a step SI 04, based on the management da- 
ta, it is determined whether or not Event State = Attack 
holds. If the answer to this question is affirmative (Y), it 
means that preceding silence exists, and in a step SI 06, 
a silence singing length Is calculated. The details of the 
processing in the step S106 will be described hereinaf- 
ter with reference to FIG. 24. 

[0099] If the answer to the determination in the step 
S1 04 is negative (N), It means that Event State = Tran- 
sition holds, and hence a preceding vowel exists, so that 
in a step SI 08, a preceding vowel singing length is cal- 
culated. The details of the process in the step SI 08 will 
be described hereinafter with reference to FIG. 28. 
[0100] When the processing in the stepS106 orS108 
is completed, in a step S110, a vowel singing length is 
calculated. The details of the processing in the step 
S1 1 0 will be described hereinafter with reference to FIG 
32. 

[0101] FIG. 23 shows the phonetic unit transition time 
length-acquisition process carried out in the step SI 02. 
[0102] In a step S1 12, management data and score 
data are obtained. Then, in a step S1 14, all phonetic unit 
transition time lengths (phonetic unit transition time 
lengths obtained in steps S116, S122, S124, S126, 
SI 30, SI 32, SI 34, alt hereinafter refen-ed to) are initial- 
ized. 

[0103] In a step 8116, a phonetic unit transition time 
length of V_Sil (vowel to silence) is retrieved from the 
DB 14b based on the management data. Assuming, for 
example, that the vowel is "a", and the pitch of the vowel 
is "PI", the phonetic unit transition time length corre- 
sponding to "a_SII" and "PI" is retrieved from the DB 
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14b. The processing in the step S116 is related to the 
fact that In the Japanese language syllables terminate 
In vowel. 

[0104] In a step S11 8, based on the management da- 
ta, it is determined whether or not Event State = Attack 5 
holds. If the answer to this question is affirmative (Y), it 
is determined based on the management data in a step 
S120 whether or not PhU State = Consonant Vowel 
holds. If the answer to this question Is affirmative (Y), a 
phonetic unit transition time length of Sil_C (silence to io 
consonant) is retrieved from the DB 14b based on the 
management data In a step S122. Thereafter, in a step 
8124, based on the management data, a phonetic unit 
transition time length of C_V (consonant to vowel) Is re- 
trieved from the DB 1 4b. is 
[0105] If the answer to the question of the step SI 20 
Is negative (N), It means that PhU State = Vowel holds, 
so that in a step S126, a phonetic unit transition time 
length of Sil_V is retrieved from the DB 14b based on 
the management data. It should be noted that the details 20 
of the manner of retrieving the transition time lengths at 
the respective steps SI 22 to 81 26 are the same as de- 
scribed as to the step 8116. 

[0106] If the answer to the question of the step S1 1 8 
is negative (N), similarly to the step 8120, it is deter- ^5 
mined in a step SI 28 whether or not PhU state = Con- 
sonant Vowel holds. If the answer to this question is af- 
firmative (Y), In a step 81 30, based on the management 
data and the score data, a phonetic unit transition time 
length of pV_C (preceding vowel to consonant) is re- 30 
trieved from the DB 14b. Assuming, for example, that 
the score data indicates that the preceding vowel Is "a", 
and the management data indicates that the consonant 
is "s" and its pitch is "P2", a phonetic unit transition time 
length corresponding to "a_s" and "P2" is retrieved from 35 
the DB 14b. Thereafter, in a step 8132, similarly to the 
step 8116, a phonetic unit transition time length of C_V 
(consonant to vowel) is retrieved from the DB 1 4b based 
on the management data. 

[0107] If the answer to the question of the step 8128 40 
is negative (N), the process proceeds to a step 8134, 
wherein similariy to the step 8130, a phonetic unit tran- 
sition time length of p V_V (preceding vowel to vowel) is 
retrieved from the DB 14b based on the management 
data and the score data. 45 
[0108] FIG. 24 shows the silence singing length-cal- 
culating process carried out In the step 8106. 
[0109] First, in a step 8136, performance data, man- 
agement data and score data are obtained. In a step 
81 38, it is determined whether or not PhU State = Con- so 
sonant Vowel holds. If the answer to this question Is af- 
firmative (Y), in a step 81 40, a consonant singing length 
is calculated. In this case, as shown in FIG. 25, the con- 
sonant singing time is determined by adding together a 
consonant portion of the silence-to-consonant phonetic 55 
unit transition time length, the consonant singing length, 
and a consonant portion of the consonant-to-vowel pho- 
netic unit transition time length. Accordingly, the conso- 



nant singing length is part of the consonant singing time. 
[01 10] FIG. 25 shows an example of determination of 
the consonant singing length carried out when the sing- 
ing consonant expansion/compression ratio contained 
in the performance information is larger than 1. In this 
case, the sum of the consonant length of Sil_C and the 
consonant length of C_V added together is used as a 
basic unit, and this basic unit Is multiplied by the singing 
consonant expansion/compression ratio to obtain the 
consonant singing length C. Then, the consonant sing- 
ing time is lengthened by interposing the consonant 
singing length C between SiLC and C_V. 
[0111] FIG. 26 shows an example of detemnination of 
the consonant singing length carried out when the sing- 
ing consonant expansion/compression ratio contained 
in the performance information is smaller than 1 . In this 
case, the consonant length of Sil_C and the consonant 
length of C_V are each multiplied by the singing conso- 
nant expansion/compression ratio to shorten the re- 
spective consonant lengths. As a result, the consonant 
singing time formed by the consonant length of Sil_C 
and the consonant length of C_V is shortened. 
[0112] In a step 8142, the silence singing length is 
calculated. As shown in FIG. 27, silence time is deter- 
mined by adding together a silence portion of a preced- 
ing vowel-to-silence phonetic unit transition time length, 
a silence singing length, a silence portion of a sllence- 
to-consonant phonetic unit transition time length, and a 
consonant singing time, or adding together a silence 
portion of a preceding vowel-to-silence phonetic unit 
transition time length, a silence singing length, a silence 
portion of a silence-to-vowel phonetic unit transition 
time length. Therefore, the silence singing length is part 
of the silence time. In the step 8 1 42, in accordance with 
the order of singing, the silence singing length Is calcu- 
lated such that the boundary between the consonant 
portion of C_V and the vowel portion of the same, or the 
boundary between the silence portion of SII_V and the 
vowel portion of the same coincides with the actual sing- 
ing-starting time point (Current Note On). In short, the 
silence singing length is calculated such that the sing- 
ing-starting time point of the vowel of the present per- 
formance data coincides with the actual singlng-starting 
time point. 

[0113] FIGS. 27Ato27C show phonetic unit connec- 
tion patterns different from each other. The pattern 
shown In FIG. 27A corresponds to a case of a preceding 
vowel "a" - silence - "sa", for example, in which to length- 
en the consonant "s", the consonant singing length C is 
inserted. The pattern shown in FIG. 27B corresponds to 
a case of a preceding vowel "a" - silence - "pa", for ex- 
ample. The pattern shown In FIG. 27C corresponds to 
a case of a preceding vowel "a" - silence - "i", for exam- 
ple. 

[0114] FIG. 28 shows the preceding vowel singing 
length-calculating process executed In the step 8108. 
[0115] First, In a step 8146, performance data, man- 
agement data, and score data are obtained. In a step 
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S 1 48, it is determined whether or not PhU State = Con- 
sonant Vowel holds. If the answer to this question is af- 
firmative (Y), in a step S150, the consonant singing 
length Is calculated. In this case, as shown in FIG. 29, 
the consonant singing length is determined by adding s 
together a consonant portion of the preceding vowel-to- 
consonant phonetic unit transition time length, a conso- 
nant singing, length, a consonant portion of the conso- 
nant-to-vowel phonetic unit transition time length. 
Therefore, the consonant singing length is part of the io 
consonant singing time. 

[01 16] FIG. 29 shows an example of determination of 
the consonant singing length carried out when the sing- 
ing consonant expansion/compression ratio contained 
in the performance information is larger than 1. In this is 
case, the sum of the consonant length of pV_C and the 
consonant length of C_V added together Is used as a 
basic unit, and this basic unit is multiplied by the singing 
consonant expansion/compression ratio lo obtain the 
consonant singing length C. Then, the consonant sing- 20 
Ing time is lengthened by interposing the consonant 
singing length C between pV_C and C_V. 
[01 1 7] FIG. 30 shows an example of determination of 
the consonant singing length carried out when the sing- 
ing consonant expansion/compression ratio contained 25 
in the performance information is smaller than 1 . In this 
case, the consonant length of pV_C and the consonant 
length of C_V are each multiplied by the singing conso- 
nant expansion/compression ratio to shorten the re- 
spective consonant lengths. As a result, the consonant 30 
singing time formed by the consonant length of pV_C 
and the consonant length of C_V is shortened. 
[0118] Then, in a step S152, the preceding vowel 
singing length Is calculated. As shown in FIG. 31 , a pre- 
ceding vowel singing time is determined by adding to- 35 
gether a vowel portion of X (Sil_Consonant or vowei)- 
to-preceding vowel phonetic unit transition time length, 
a preceding vowel singing length, and a vowel portion 
of the preceding vowel-to-consonant or vowel phonetic 
unit transition time length. Therefore, the preceding 40 
vowel singing length is part of the preceding vowel sing- 
ing time. Further, the reception of the present perform- 
ance data makes definite the connection between the 
preceding performance data and the present perform- 
ance data, so that the vowel singing length and V_SII 45 
fonned based on the preceding perfomnance data are 
discarded. More specifically, the assumption thai "si- 
lence is interposed between the present performance 
data and the next perf onnance data" for use in the vowel 
singing length-calculating process in FIG. 32, described so 
hereinafter, is annuled. In the step S152, in accordance 
with the order of singing, the preceding vowel singing 
length is calculated such that the boundary between the 
consonant portion of C_V and the vowel portion of the 
same, or the boundary between the preceding vowel 55 
portion of pV_V and the vowel portion of the same co- 
incides with the actual singing-starting time point (Cur- 
rent Note On). In short, the preceding vowel singing 



length is calculated such that the singing-starting time 
point of the vowel of the present performance data co- 
incides with the actual singing-starting time point. 
[01 1 9] FIGS. 31 A to 31 C show phonetic unit connec- 
tion patterns different from each other. The pattern 
shown in FIG. 31 A corresponds to a case of a preceding 
vowel "a" - "sa", for example, in which to lengthen the 
consonant "s", the consonant singing length C is insert- 
ed. The pattern shown in FIG. 31 B corresponds to a 
> case of a preceding vowel "a" - "pa", for example. The 
pattern shown in FIG. 31 C corresponds to a case of a 
preceding vowel "a" - "I", for example. 
[0120] FIG. 32 shows the vowel singing length-calcu- 
lating process in the step S11 0. 
[0121] First, in a step S154. perfonnance infonnation, 
management data and score data are obtained. In a 
step S156, the vowel singing length is calculated. In this 
case, until the next perfomiance data is received, a vow- 
el connecting portion is not made definite. Therefore, it 
is assumed that "silence is interposed between the 
present perfomiance data and the next performance da- 
ta", and as shown in FIG. 33, the vowel singing length 
Is calculated by connecting V_Sil to the vowel portion 
as shown in FIG. 33. At this time, the vowel singing time 
is temporarily determined by adding together a vowel 
portion of an X-to-vowel phonetic unit transition time 
length, a vowel singing length, and a vowel portion of a 
vowel-to-silence phonetic unit transition time length. 
Therefore, the vowel singing length becomes part of the 
vowel singing time. In the step SI 56. in accordance with 
the order of singing, the vowel singing length is calcu- 
lated such that the boundary between the vowel portion 
and silence portion of V_SiLCoincides with the actual 
singing end time point (Current Note Off). 
[0122] When the next performance data is received, 
the state of connection (Event State) between the 
present performance data and the next performance da- 
ta becomes definite, and if Event State = Attack holds 
for the next perf onnance data, the vowel singing length 
of the present performance data is not updated, while if 
Event State = Transition holds for the next performance 
data, the vowel singing length of the present perform- 
ance data is updated by the process in the step SI 52 
described above. 

[0123] FIG. 34 shows the transition track-forming 
process carried out in the step S82. 
[0124] First in a step SI 60, performance information, 
management data, score data, and data of the phonetic 
unit track are obtained. In a step SI 62, an attack tran- 
sition time length is calculated. To this end, the state 
transition time length of an attack transition state Attack 
con-esponding to a singing attack type, a phonetic unit, 
and pitch, is retheved from the state transition DB 14c 
shown in FIG. 7 based on the performance information 
and the management data. Then, the retrieved state 
transition time length is multiplied by a singing attack 
expansion/compression ratio in the performance infor- 
mation to obtain the attack transition time length (dura- 
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tion time of the attack portion). 

[0125] In a step SI 64, a release transition time length 
is calculated. To this end, the state transition time length 
of a release transition state Release corresponding to a 
singing release type, a phonetic unit, and pitch, is re- 
trieved from the state transition DB 14c based on the 
performance information and the management data. 
Then, the retrieved state transition time length is multi- 
plied by a singing release expansion/compression ratio 
in the perfomnance information to obtain the release 
transition time length (duration time of the release por- 
tion). 

[0126] In a step SI 66, an NtN transition time length is 
obtained. More specifically, from score data stored in the 
step 86 in FIG. IB, the NtN transition time length from 
the preceding vowel (duration time of a note transition 
portion) is obtained. 

[0127] In a step SI 68, it is determined whether or not 
Event Slate = Attack holds. If the answer to this question 
is affinnative (Y), a NONE transition time length corre- 
sponding to the silence portion (referred to as "NONEn 
transition time length") Is calculated in a step S170. 
More specifically, in the case of PhU State = Consonant 
Vowel, as shown in FIGS. 35A and 35B, the NONEn 
transition time length is calculated such that the singing- 
starting jtime point of the consonant coincides with an 
attack transition-starting time point (leading end of the 
attack transition time length). The FIG. 35A example dif- 
fers from the FIG. 358 example in that a consonant sing- 
ing length C is interposed in the consonant singing time. 
In the case of PhU State = Vowel, as shown in FIG. 35C, 
the NONEn transition time length is calculated such that 
-the singing-starting time point of the vowel coincides 
with the attack transition-starting time point. 
[0128] In the step S170. the NONE transition time 
length corresponding to the steady portion (referred to 
as "NONEs transition time length) is calculated. In this 
case, until the next performance data is received, the 
state of connection following the NONEs transition time 
length is not made definite. Therefore, it is assumed that 
"silence is interposed between the present performance 
data and the next performance data", and as shown in 
FIG. 35A to 35C, the NONEs transition time length is 
calculated with the release transition connected thereto. 
More specifically, the NONEs transition time length is 
calculated such that a release transition end time point 
(trailing end of the release transition time length) coin- 
cides with an end time point of V_Sil, based on an end 
time point of the preceding performance data, the end 
time point of V_Sil, the attack transition time length, the 
release time length and the NONEn transition time 
length. 

[0129] If the answer to the question of the step SI 68 
is negative (N), in a step S1 74, a NONE transition time 
length corresponding to the steady portion of the pre- 
ceding pertormance data (referred to as "pNONEs tran- 
sition time length") is calculated. Since the reception of 
the present performance data has made definite the 



state of connection with the preceding performance da- 
ta, the NONEs transition time length and the preceding 
release transition time length formed based on the pre- 
ceding perfomriance data are discarded. More specifi- 

5 cally, the assumption "silence is interposed between the 
present performance data and the next performance da- 
ta" employed in the processing in a step S176, de- 
scribed hereinafter, is annuled. In the step SI 74, as 
shown in FIGS. 36A to 360, in both of the cases of PhU 

10 State = Cosonant Vowel and PhU State = Vowel, the 
pNONEs transition time length Is calculated such that 
the boundary between T^ and Tg of the NtN transition 
time length from the preceding vowel coincides with the 
actual singing-starting time point (Current Note On) of 

15 the present performance data based on the actual sing- 
ing-starting time point and the actual singing end time 
point of the preset perfonnance data and the NtN tran- 
sition time length. The FIG. 36A example differs from 
the FIG. 36B example In that the consonant singing 
length C is interposed in the consonant singing time. 
[0130] In the step S176, the NONE transition time 
length corresponding to the steady portion (NONEs 
transition time length) is calculated. In this case, until 
the next performance data is received, the state of con- 

^5 nection with the NONEs transition time length is not 
made definite. Therefore, it is assumed that "silence is 
interposed between the present performance data and 
the next performance data", and as shown in FIG. 36A 
to 36C, the NONEs transition time length is calculated 

30 with the release transition connected thereto. More spe- 
cifically, the NONEs transition time length is calculated 
such that the boundary between T^ and T2 of the NtN 
transition time length continued from the preceding vow- 
el coincides with the actual singing-starting time point 

35 (Current Note On) of the present performance data and 
at the same time, the release transition end time point 
(trailing end of the release transition time length) coin- 
cides with the end tirne point of V„Sil, based on the ac- 
tual singing-starting time point of the present perform- 

40 ance data, the end time point of V_Sil, the NtN transition 
time length continued from the preceding vowel, and the 
release transition time length. 

[0131] FIG. 37 shows the vibrato track-forming proc- 
ess carried out in the step S84. 

45 [0132] First, in a step SI 80, performance information, 
management data, score data, and data of a phonetic 
unit track are obtained. In a step SI 82, it is determined 
based on the obtained data whether or not the vibrato 
event should be continued. If vibrato is started at the 

50 actual singing-starting time point of the present perform- 
ance data, and at the same time the vibrato-added state 
is continued from the preceding performance data, the 
answer to this question is affirmative (Y), so that the 
process proceeds to a step SI 84. On the other hand, 

55 although vibrato is started at the actual singing-starting 
time point of the present performance data, the vibrato- 
added state is not continued from the preceding per- 
formance data, or if vibrato is not started at the actual 
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singing-starting time point of the present performance 
data, the answer to this question is negative (N), so that 
the process proceeds to a step S1 88. 
[0133] In many cases, vibrato is sung over a plurality 
of performance data (notes). Even if vibrato is started at 5 
the actual singing-starting time point of the present per- 
formance data, there are a case as shown in FIG. 38A 
in which the vibrato-added state is continued from the 
preceding note, and a case as shown in FIGS. 38D, 38E 
in which the vibrato is additionally started at the actual io 
singing-starting time point of the present note. Similarly, 
even as to the non-vibrato state (vibrato-non-added 
state), there are a case as shown in FIG. 38B In which 
the non-vibrato state is continued from the preceding 
note and a case as shown in FIG. 38C in which the non- is 
vibrato state is started at the actual singing-starting time 
point of the present note: 

In the step S188, it is determined based on the ob- 
tained data whether or not the non-vibrato event 20 
should be continued. In the FIG. 38B case in which 
the non-vibrato state Is to be continued from the pre- 
ceding note, the answer to this question becomes 
affinnative (Y), so that the process proceeds to a 
step S1 90. On the other hand, in the FIG. 38C case 25 
in which although the non-vibrato state is started at 
the actual singing-starting time point of the present 
note, this state is not continued from the preceding 
note, or In the case where the non-vibrato state is 
not started at the actual singing-starting time point 30 
of the present note, the answer to the question of 
the step S188 becomes negative (N), so that the 
process proceeds to a step S194. 

[0134] If the vibrato event is to be continued, in the 35 
step S1 84, the preceding vibrato time length is discard- 
ed. Then, in a step S186, a new vibrato time length is 
calculated by connecting (adding) together the preced- 
ing vibrato time length and a vibrato time length of vi- 
brato to be started at the actual singing-starting time 40 
point of the present note. Then, the process proceeds 
to the step S194. 

[0135] If the non-vibrato event is to be continued, In 
the step S190, the preceding non-vibrato event time 
length is discarded. Then, a new non-vibrato event time 45 
length is calculated by connecting (adding) together the 
preceding non-vibrato lime length and a non-vibrato 
time length of non-vibrato to be started at the actual 
singing-starting time point of the present note. Then, the 
process proceeds to the step S1 94. 50 
[0136] In the step S194, it is detennined whether or 
not the vibrato time length should be added. If the an- 
swer to this question is affirmative (Y), first, in a step 
S196, a non-additional vibrato time length is calculated. 
More specifically, a non-vibrato time length from the 55 
trailing end of the vibrato time length calculated in the 
step S1 86 to a vibrato time length to be added is calcu- 
lated as the non-additional vibrato time length. 



[0137] Then, in a step S1 98, an additional vibrato time 
length is calculated. Then, the process returns to the 
step S194, wherein the above-described process is re- 
peated. This makes it possible to add a plurality of ad- 
ditional vibrato time lengths. 

[0138] If the answer to the question of the step S1 94 
is negative (N), the non-vibrato time length is calculated 
in a step S200. More specifically, a time period from the 
final time point of a final vibrato event to the end time 
point of V^Sil within the actual singing time length (time 
length between Current Note On to Current Note Off) is 
calculated as the non-vibrato time length. 
[0139] Although in the above steps S1 42 to S1 52, the 
silence singing length or the preceding vowel singing 
length is calculated such that the singing-starting time 
point of the vowel of the present performance data co- 
incides with the actual singing-starting time point, this Is 
not limitative, but for the purpose of synthesizing more 
natural singing voices, the silence singing length, the 
preceding vowel singing length and the vowel singing 
length may be calculated as in (1 ) to (1 1 ) described be- 
low: 

(1 ) For each of categories (unvoiced/voiced plosive 
sound, unvoiced/voiced fricative sound, nasal 
sound, half vowel, etc.) of consonants, a silence 
singing length, a preceding vowel singing length, 
and a vowel singing length are calculated. FIGS. 
39A to 39E show examples of calculation of the si- 
lence singing length, showing that in the case where 
the consonant belongs to nasal soun d or half vowel . 
the manner of determination of the silence singing 
length is made different from the other cases. 

The phonetic unit connection pattern shown in 
FIG. 39A corresponds to a case of the preceding 
vowel "a" - silence - "sa". The silence singing length 
is calculated with the consonant singing length C 
being inserted to lengthen the consonant ("s" in this 
example) of a phonetic unit formed by a consonant 
and a vowel. The phonetic unit connection pattern 
shown in FIG. 39B corresponds to a case of the pre- 
ceding vowel "a" - silence - "pa". The silence singing 
length is calculated without the consonant singing 
length being inserted for a phonetic unit formed by 
a consonant and a vowel. The phonetic unit con- 
nection pattern shown in FIG. 39C corresponds to 
a case of the preceding vowel "a" - silence - "na". 
The silence singing length is calculated with the 
consonant singing length C being inserted to 
lengthen the consonant ("n" In this example) of a 
phonetic unit fonned by a consonant (nasal sound 
or half vowel) and a vowel. The phonetic unit con- 
nection pattern shown in FIG. 39D is the same as 
the FIG. 39C example except that the consonant 
singing length C is not inserted. The phonetic unit 
connection pattern shown in FIG. 39E correspond 
to a case of the preceding vowel "a" - silence - "I". 
The silence singing length Is calculated for a pho- 
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netic unit formed by vowels alone (the same applies 
to a phonetic unit fomned by consonants (nasal 
sounds) alone). 

In the examples shown in FIGS. 39A, 39B, and 
39E, the silence singing length is calculated such 
that the singing-starting time point of the vowel of 
the present perfomiance data coincides with the ac- 
tual singing-starting time point. In the examples 
shown in FIGS. 39C and 39D, the silence singing 
length is calculated such that the singing-starting 
time point of the consonant of the present perform- 
ance data coincides with the actual singing-starting 
time point. 

(2) For each of consonants ("p", "b", "s", "z", "n", 
"w", etc.), a silence singing length, a preceding vow- 
el singing length, a vowel singing length are calcu- 
lated. 

(3) For each of vowels ("a", "i", "u", "e", "o", etc.), a 
silence singing length, a preceding vowel singing 
length, a vowel singing length are calculated. 

(4) For each of the categories (unvoiced/voiced plo- 
sive sound, unvoiced/voiced fricative sound, nasal 
sound, half vowel, etc.) of consonants, and at the 
same time for each vowel ("a", "i", "u", "c", "o", or 
the like) continued from the consonant, a silence 
singing length, a preceding vowel singing length 
and a vowel singing.length are calculated. That is, 
for each combination of a category to which a con- 
sonant belongs and. a vowel, the silence singing 
length, the preceding vowel singing length and the 
vowel singing length are calculated. 

(5) For each of the consonants ("p", "b", "s", "z", "n", 
"w", etc.), and at the same time for each vowel con- 
tinued from the consonant, a silence singing length, 
a preceding vowel singing length and a vowel sing- 
ing length are calculated. That is, for each combi- 
nation of a consonant and a vowel, the silence sing- 
ing length, the preceding vowel singing length and 
the vowel singing length are calculated. 

(6) For each of preceding vowels ("a", "i", "u", "e", 
"o", etc.), a silence singing length, a preceding vow- 
el singing length, a vowel singing length are calcu- 
lated. 

(7) For each of the preceding vowels ("a", "i", "u", 
"e", "o", etc.), and at the same time for each cate- 
gory (unvoiced/voiced plosive sound, unvoiced/ 
voiced fricative sound, nasal sound, half vowel, or 
the like) of a consonant continued from the preced- 
ing vowel, a silence singing length, a preceding 
vowel singing length and a vowel singing length are 
calculated. That is, for each combination of a pre- 
ceding vowel and a category to which a consonant 
belongs, the silence singing length, the preceding 
vowel singing length and the vowel singing length 
are calculated. 

(8) For each of the preceding vowels ("a", "i", "u", 
"e", "o", etc.), and at the same time for each conso- 
nant ("p". "b", "s", "z". "n". "w", or the like) continued 



from the preceding vowel, a silence singing length, 
a preceding vowel singing length and a vowel sing- 
ing length are calculated. That is, for each combi- 
nation of a preceding vowel and a consonant, the 
5 silence singing length, the preceding vowel singing 
length and the vowel singing length are calculated. 

(9) For each of the preceding vowels "a", "i", "u", 
"e", "o", etc.), and at the same time for each vowel 
("a", "i", "u", "e", "o", or the like) continued from the 

10 preceding vowel, a silence singing length, a preced- 
ing vowel singing length and a vowel singing length 
are calculated. That Is, for each combination of a 
preceding vowel and a vowel, the silence singing 
length, the preceding vowel singing length and the 

15 vowel singing length are calculated. 

(1 0) For each of the preceding vowels ("a", "i", "u", 
"e", "o", etc.), for each category (unvoiced/voiced 
plosive sound, unvoiced/voiced fricative sound, na- 
sal sound, half vowel, or the like) of a consonant 

20 continued from the preceding vowel, and for each 
vowel ("a", "i", "u", "e", "o", or the like) continued 
from the consonant, a silence singing length, a pre- 
ceding vowel singing length and a vowel singing 
length arc calculated. That is, for each combination 

25 of a preceding vowel, a category to which a conso- 
nant belongs, and a vowel, the silence singing 
length, the preceding vowel singing length and the 
vowel singing length are calculated. 

(11) For each of the preceding vowels ("a", "i", "u", 
30 "e", "o", etc.), for each consonant ("p". "b", "s", "z", 

"n", "w", or the like) continued from the preceding 
vowel, and for each vowel ("a", "i", "u", "e", "o", or 
the like) continued from the consonant, a silence 
singing length, a preceding vowel singing length 
35 and a vowel singing length are calculated. That Is, 
for each combination of a preceding vowel, a con- 
sonant, and a vowel, the silence singing length, the 
preceding vowel singing length and the vowel sing- 
ing length are calculated. 

40 

[0140] The present invention is by no means limited 
- to the embodiment described hereinabove by way of ex- 
ample, but can be practiced In various modifications and 
variations. Examples of such modifications and varia- 
45 tions Include the following: 

(1) Although in the above described embodiment, 
after completing the forming of a singing voice syn- 
thesis score, singing voices are synthesized ac- 

50 cording to the singing voice synthesis score, this is 
not limitative, but while forming a singing voice syn- 
thesis score, singing voices may be synthesized 
based on the formed portion of the score. To carry 
out this, it is only required that while preferentially 

55 performing the reception of performance data by an 
interrupt handling routine, the singing voice synthe- 
sis score may be formed based on the received por- 
tion of the performance data. 
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(2) Although in the above embodiment, the formant- 
forming method is employed for the tone generation 
method, this is not limitative but a waveform 
processing method or other suitable method may 

be employed. 5 

(3) Although in the above embodiment, the singing 
voice synthesis score is formed by three tracks of a 
phonetic unit track, a transition track and a vibrato 
track, this is not limitative, but the same may be 
formed by a single track. To this end, Infonnatlon of io 
the transition track and the vibrato track may be in- 
serted into the phonetic unit track, as required. 

[01 41 ] It goes without saying that the above described 
embodiment, modifications or variations may be real- is 
ized even in the form of a program as sottware to thereby 
accomplish the object of the present Invention. 
[0142] Further, It also goes without saying that the ob- 
ject of the present Invention may be accomplished by 
supplying a storage medium in which is stored software 20 
program code executing the singing voice-syntheslzing 
method or realizing the functions of the singing voice- 
synthesizing apparatus according to the above de- 
scribed embodiment, modifications or variations, and 
causing a computer (CPU or MPU) of the apparatus to 25 
read out and execute the program code stored in the 
storage medium. 

[0143] In this case, the program code itself read out 
from the storage medium achieves the novel functions 
of the above embodiment, modifications or variations, 30 
and the storage medium storing the program constitutes 
the present invention. 

[01 44] The storage medium for supplying the program 
code to the system or apparatus may be in the form of 
a floppy disk, a hard disk, an optical memory disk, an 35 
magneto-optical disk, a CD-ROM, a CD-R (CD-Record- 
able), DVD-ROM, a semiconductor memory, a magnetic 
tape, a nonvolatile memory card, or a ROM, for exam- 
ple. Further, the program code may be supplied from a 
server computer via a MIDI apparatus or a communica- 40 
tion network. 

[0145] Further, needless to say, not only the functions 
of the above embodiment, modifications or variations 
can be realized by carrying out the program code read 
out by the computer but also an OS (operating system) 45 
or the like operating on the computer can carry out part 
or whole of actual processing In response to Instructions 
of the program code, thereby making it possible to im- 
plement the functions of the above embodiment, modi- 
fications or variations. so 
[0146] Furthermore, it goes without saying that after 
the program code read out from the storage medium has 
been written in a memory Incorporated In a function ex- 
tension board Inserted in the computer or in a function 
extension unit connected to the computer, a CPU or the 55 
like arranged in the function extension board or the func- 
tion extension unit may carry out part or whole of actual 
processing In response to the Instructions of the code 



of the next program, thereby making it possible to 
achieve the functions of the above embodiment, modi- 
fications or variations. 



Claims 

1. A singing voice-synthesizing method comprising 

the steps of: 

inputting phonetic unit information representa- 
tive of a phonetic unit, time information repre- 
sentative of a singing-starting time point, and 
singing length information representative of a 
singing length, in timing earlier than the sing- 
ing-starting time point, for a singing phonetic 
unit including a sequence of a first phoneme 
and a second phoneme; 
generating a phonetic unit transition time length 
formed by a generation time length of the first 
phoneme and a generation time length of the 
second phoneme, based on the inputted pho- 
" netic unit information; 

i determining a singing-starting time point and a 
f singing duration time of the first phoneme and 
'i a singing-starting time point and a singing du- 
f ration time of the second phoneme, based on 
J* the generated phonetic unit transition time 
• length, the inputted time infonnation and sing- 
ing length infonnatlon; and 
starting generation of a first singing voice and 
' a second singing voice formed by the first pho- 
A^neme and the second phoneme at the singing- 
•'Startlng time point of the first phoneme and the 
singing-starting time point of the second pho- 
neme, respectively, and continuing generation 
of the first singing voice and the second singing 
voice for the singing duration time of the first 
phoneme and the singing duration time of the 
second phoneme, respectively. 

A singing voice-synthesizing method according to 
claim 1 , wherein the determining step includes set- 
ting the singing-starting time point of the first pho- 
neme to a time point earlier than the singing-starting 
time point represented by the time Information. 



2. 



3. A singing volce-syntheslzlngapparatuscomprising: 

an input section that inputs phonetic unit infor- 
mation representative of a phonetic unit, time 
infomnatlon representative of a singing-starting 
time point, and singing length information rep- 
resentative of a singing length, in timing earlier 
than the singing-starting time point, for a pho- 
netic unit including a sequence of a first pho- 
neme and a second phoneme; 
a storage section that stores a phonetic unit 
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transition time length formed by a generation 
time length of the first phoneme and a genera- 
tion time length of the second phoneme; 
a readout section that reads out the phonetic 
unit transition time length from said storage 5 
section based on the phonetic unit information 
inputted by said input section; 
a calculating section that calculates a singing- 
starting time point and a singing duration time 
of the first phoneme, and a singing-starting time io 
point and a singing duration time of the second 6. 
phoneme, based on the phonetic unit transition 
time length read by said readout section and 
the time information and the singing length in- 
fomnation which have been inputted by said in- is 
put section; and 

a singing volce-syntheslzing section that starts 
generation of a first singing voice and a second 
singing voice formed by the first phoneme and 
the second phoneme at the singing-starting ^0 
time point of the first phoneme and the singing- 
starting time point of the second phoneme cal- 
culated by said calculating section, respective- 
ly, and continuing generation of the first singing 

^ voice and the second singing voice for the sing- ^5 
ing duration time of the first phoneme and the 

-rsinging duration time of the second phoneme 

^calculated by said calculating section, respec- 

^tively. 

30 

4. A singing voice-synthesizing apparatus according 
to claim 3, wherein said Input section Inputs modi- 
fying infonnation for modifying the generation time 
length of the first phoneme, and wherein said cal- 
culating section modifies the generation time length 35 
of the first phoneme in the phonetic unit transition 
time length read by said readout section according 

to the modifying information inputted by said input 7. 
section, and then calculates the singing-starting 
time point and the singing duration time of the first 40 
phoneme and the singing-starting time point and 
the singing duration time of the second phoneme, 
based on the phonetic unit transition time length in- 
cluding the modified generation time length of the 
first phoneme. 45 

5. A singing voice-synthesizing method comprising 
the steps of: 

inputting phonetic unit infonnation represents- so 
tive of a phonetic unit, time information repre- 
sentative of a singing-starting time point, and 8. 
singing length information representative of a 
singing length, for a singing phonetic unit; 
generating a state transition time length corre- 55 
sponding to a rise portion, a note transition por- 
tion, or a fall portion of the singing phonetic unit, 
based on the inputted phonetic unit Infonnation; 



A2 40 

and 

generating a singing voice fonned by the pho- 
netic unit, based on the phonetic unit informa- 
tion, the time Infonnation, and the singing 
length information which have been inputted, 
the generating step including adding a change 
In at least one of pitch and amplitude to the 
singing voice during a time period correspond- 
ing to the generated state transition time length. 

Asinging voice-synthesizing apparatuscomprising: 

an input section that inputs phonetic unit infor- 
mation representative of a phonetic unit, time 
infonnation representative of a singing-starting 
time point, and singing length information rep- 
resentative of. a singing length, for a singing 
phonetic unit; 

a storage section that stores stale transition 
time length corresponding to a rise portion, a 
note transition portion, or a fall portion of the 
singing phonetic unit; 

a readout section that reads out the state tran- 
sition time length from said storage section 
based on the phonetic unit information Inputted 
by said input section; and 
a singing voice-synthesizing section that gen- 
erates a singing voice fomned by the phonetic 
unit, based on the phonetic unit infonnation, the 
time information, and the singing length infor- 
mation which have been inputted by said input 
section, said singing voice-synthesizing sec- 
tion adding a change in at least one of pitch and 
amplitude to the singing voice during a time pe- 
riod corresponding to the state transition time 
length read out by said readout section. 

A singing voice-synthesizing apparatus according 
to claim 6, wherein said input section inputs modi- 
fying infonnation for modifying the state transition 
time length, and wherein the singing voice-synthe- 
sizing apparatus Includes a modifying section that 
modifies the state transition time length read out by 
said readout section based on the modifying infor- 
mation inputted by said input section, and wherein 
said singing voice-synthesizing section adds a 
change In at least one of pilch and amplitude to the 
singing voice during a time period corresponding to 
the state transition time length modified by said 
modifying section. 

A signing sound-synthesizing apparatus compris- 
ing: 

an Input section that inputs phonetic unit infor- 
mation representative of a phonetic unit, time 
Information representative of a singing-starting 
time point, singing length Information repre- 
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sentative of a singing length, and effects-im- 
parting information, for a singing phonetic unit; 
and 

a singing voice-synthesizing section that gen- 
erates a singing voice formed by the phonetic 
unit, based on the phonetic unit information^ the 
time information, and the singing length infor- 
mation which have been inputted by said input 
section, said singing voice synthesizing section 
imparting effects to the singing voice based on 
the effects-imparting information inputted by 
said input section. 

9. A signing sound-synthesizing apparatus according 
to claim 8, wherein the effects-imparting information 
inpuned by said input section represents an effects- 
imparting time period, wherein the singing voice- 
synthesizing apparatus further comprises a setting 
section that sets a new effects-Imparting lime peri- 
od corresponding to both the effects-imparting time 
period represented by the effects-imparting infor- 
mation and a second effects-imparting time period 
of a singing phonetic unit preceding the singing pho- 
netic unit if the effects-imparting time period is con- 
tinuous from the second effects-imparting time pe- 
riod, and wherein said singing voice-synthesizing 
section imparts effects to the singing voice during 
the new effects-imparting time period set by said 
setting section. 

1 0. A singing voice-synthesizing apparatus comprising: 

an Input section that inputs phonetic unit Infor- 
mation representative of a phonetic unit, time 
information representative of a singing-starting 
time point, and singing length infomnation rep- 
resentative of a singing length, for a singing 
phonetic unit, in timing earlier than the singing- 
starting time point; 

a setting section that randomly sets a new sing- 
ing-starting time point, within a predetermined 
time range extending before and after the sing- 
ing-starting time point, based on the time infor- 
mation inputted by said input section; and 
a singing voice-synthesizing section that gen- 
erates a singing voice formed by the phonetic 
unit, based on the phonetic unit information and 
the singing length information which have been 
Inputted by said input section, and the singing- 
starting time point set by said setting section, 
said singing voice synthesizing section starting 
generation of the signing sound at the new sing- 
ing-starting time point set by said setting sec- 
tion, 

1 1. A storage medium storing a program for executing 
a singing volce-syntheslzing method, the program 
comprising: 
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40 
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50 



55 



an input module that inputs phonetic unit infor- 
mation representative of a phonetic unit, time 
infonnation representative of a singing-starting 
time point, and singing length information rep- 
resentative of a singing length, in timing earlier 
than the singing-starting time point, for a sing- 
ing phonetic unit including a sequence of a first 
phoneme and a second phoneme; 
a phonetic unit transition time length-generat- 
ing module that generates a phonetic unit tran- 
sition time length formed by a generation time 
length of the first phoneme and a generation 
time length of the second phoneme, based on 
the inputted phonetic unit information; 
a determining module that detemrilnes a sing- 
ing-starting time point and a singing duration 
time of the first phoneme and a singing-starting 
time point and a singing duration time of the 
second phoneme, based on the generated pho- 
netic unit transition time length, the Inputted 
time information and singing length informa- 
tion; and 

a singing voice-generating module that starts 
generation of a first singing voice and a second 
singing voice formed by the first phoneme and 
the second phoneme at the singing-starting 
time point of the first phoneme and the singing- 
starting time point of the second phoneme, re- 
spectively, and continuing generation of the first 
singing voice and the second singing voice for 
the singing duration time of the first phoneme 
and the singing duration time of the second 
phoneme, respectively. 

12. A storage medium storing a program for executing 
a singing voice-syntheslzing method, the program 
comprising: 

an input module that inputs phonetic unit infor- 
mation representative of a phonetic unit, time 
Information representative of a singlng-starting 
time point, and singing length information rep- 
resentative of a singing length, for a singing 
phonetic unit; 

a state transition time length-generating mod- 
ule that generates a state transition time length 
corresponding to a rise portion, a note transi- 
tion portion, or a fall portion of the singing pho- 
netic unit, based on the inputted phonetic unit 
Infonmation; and 

a singing voice-generating module that gener- 
ates a singing voice formed by the phonetic 
unit, based on the phonetic unit infomriation, the 
time information, and the singing length infor- 
mation which have been inputted, the singing 
voice-generating module adding a change in at 
least one of pitch and amplitude to the singing 
voice during a time period corresponding to the 
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FIG. 5 
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FIG. 6 A 
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FIG. 6B 
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FIG. 7 
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FIG. 8 
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